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CLONING , EXPRESSION AND DIAGNOSIS OF 
HUMAN CYTOCHROME P450 2C19: 
THE PRINCIPAL DETERMINANT OF S -MEPHEN YTOIN METABOLISM 



TECHNICAL FIELD 
The present invention relates generally to isolation 
and exploitation of a novel member of the cytochrome P450 2C 
20 subfamily of enzymes 2C19, which is shown to be the principal 
human determinant of human S-mephenytoin metabolism. The 
invention also relates to the isolation and exploitation of an 
additional member of this family designated 2C18. 

25 BACKGROUND OF THE INVENTION 

The cytochromes P450 are a large family of 
hemoprotein enzymes capable of metabolizing xenobiotics such 
as drugs, carcinogens and environmental pollutants as well as 
endobiotics such as steroids, fatty acids and prostaglandins. 

30 Some members of the cytochrome P450 family are inducible in 
both animals and cultured cells, while other forms are 
constitutive. This group of enzymes has both harmful and 
beneficial activities. Metabolic conversion of xenobiotics to 
toxic, mutagenic and carcinogenic forms is a harmful activity. 

35 Detoxification of some drugs and other xenobiotic substances 
is a beneficial activity (Gelboin, Physiol. Rev. 60:1107-1). x 
A further beneficial activity is the metabolic processing of 
some drugs to activated forms that have pharmacological 
activity. 

40 Genetic polymorphisms of P450 enzymes result in 

phenotypically-distinct subpopulations that differ in their 
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ability to perforin particular drug biotransformation 
reactions. These phenotypic distinctions have important 
implications for selection of drugs. For example, a drug that 
is safe when administered to most human may cause intolerable 
5 side-effects in an individual suffering from a defect in a 
P450 enzyme required for detoxification of the drug. 
Alternatively, a drug that is effective in most humans may be 
ineffective in a particular subpopulation because of lack of a 
P450 enzyme required for conversion of the drug to a 

10 metabolically active form. Accordingly, it is important for 
both drug development and clinical use to screen drugs to 
determine which P450 enzymes are required for activation 
and/or detoxification of the drug. It is also important to 
identify individuals who are deficient in a particular P450 

15 enzyme. 

A cytochrome P450 polymorphism of particular concern 
results in reduced levels of S-mephenytoin 4 1 -hydroxylase 
activity in certain subpopulations. (Ktipfer et al. , Eur. J. 
Clin. Pharmacol. 26:753-759 (1984); Wedlund et al., Clin. 

20 Pharmacol. Ther. 36:773-780 (1984). Two phenotypes, extensive 
and poor metabolizers, are present in the human population. 
Poor metabolizers are detected at low frequencies in 
Caucasians (2-5%) but at higher frequencies in the Oriental 
population (-20%) (Nakamura et al., Clin. Pharmacol. Ther. 

25 38:402-408 (1985); Jurima et al., Br. J. Clin. Pharmacol. 

19:483-487 (1985) and blacks ("12%). 4 1 -hydroxylation of S- 
mephenytoin is 3-10 fold higher than that of the R- enantiomer 
in extensive metabolizers, but the ratio is approximately 1 or 
less in poor metabolizers (Yasumori et al., Mol. Pharmacol. 

30 35:443-449 (1990). Rates of S-mephenytoin 4 ' -hydroxylation in 
liver microsomes are also much higher than those of R- 
mephenytoin in extensive metabolizers. 

There is some evidence that S-mephenytoin 4* 
hydroxylase activity resides in the cytochrome P450 2C family 

35 of enzymes. A number of 2C human variants (designated 2C8, 
2C9 and 2C10) have been partially purified, and/or cloned. 
See Shimada et al. , J. Biol. Chem. 261:909-921 (1986); Kawano 
et al., J. Biochem. (Tokyo) 102:493-501 (1987); Gut et al., 
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Biochem. Biophys, Acta 884:435-447 (1986); Beaune et al., 
Biochem Biophys. Acta 840:364-370 (1985); Ged et al., 
Biochemistry 27:6929-6940 (1988)); Umbenhauer et al., 
Biochemistry 26, 1094-1099 (1987); Kimura et al., Nucleic 
5 Acids Res. 15:10053-10054 (1987); Shephard et al., Ann. Huwn. 
Gentc. 53:23-31 (1989); Yasumori et al., J. Biochem. 102:1075- 
1082 (1987); Relling et al. , *7. Pharmacol. Ther. 252:442-447. 
A comparison of the P450 2C cDNAs and their predicted amino 
acid sequences shows that about 70% of the amino acids are 

10 absolutely conserved among the human P450 2C subfamily. Some 
regions of human P450 2C protein sequences have particularly 
highly conservation, and these regions may participate in 
common P450 functions. Other regions show greater sequence 
divergence regions and are likely responsible for different 

15 substrate specificities between 2C members. 

There has been considerable controversy as to 
whether any of the known 2C members encodes the principal 
human determinant of S-mephenytoin 4 1 hydroxylase activity, in 
which the polymorphism discussed above presumably resides. 

20 The multiplicity and common properties of cytochromes P450 
make it difficult to separate their different forms, 
especially the minor forms. Even in situations where P450 
cytochromes have been isolated in purified form by 
conventional enzyme purification procedures, they have been 

25 removed from the natural biological membrane association and 
therefore require the addition of NADPH-cytochrome P450 
reductase and other cell fractions for enzymatic activity. 

The known members of the cytochrome P450 2C family 
exhibit only low- levels of S-mephenytoin 4 1 -hydroxylase 

30 activity, if any. Moreover, such low levels of activity are 

not specific for the S-enantiomer. For example, when the cDNA 
isolated by Kimura et al. (1987), supra, was expressed in 
HepG2 cells, it metabolized racemic and (R) -mephenytoin but 
had no (S) -mephenytoin hydroxylase activity, suggesting that 

35 the polymorphism in the metabolism of (S) -mephenytoin resides 
in a different member of the P450 family. As a further 
example, Yasumori et al. (1991) , supra, reported that an 
allelic variant of 2C9 (Arg 144 Tyr 358 Iso 359 Gly 417 ) showed a low- 
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level of catalytic activity toward S-mephenytoin in a cDNA- 
directed yeast expression. However, Srivastava et al., Mol. 
Pharmacol. 40:69-69 (1991) expressed an identical cDNA in 
yeast and a Arg 144 Cys 358 Iso 359 Asp 417 variant (2C10 by present 
5 nomenclature) but were unable to demonstrate catalytic 

activity of 2C9 or 2C10 toward S-mephenytoin. Relling et al., 
J. Pharmacol. Exper. Ther. 252:442-447 (1990), were also 
unable to demonstrate catalytic activity of an allelic variant 
of cys 144 Tyr 358 Ile 359 Gly 417 -2C9 toward S-mephenytoin using a 

10 retroviral cDNA expression system in HepG2 cells. In 

contrast, all of these 2C9 variants metabolized tolbutamide in 
the various expression systems confirming that failure to 
observe S-mephenytoin 4 • -hydroxylase activity was not due to 
deficiencies in the expression system. 

15 Based on the foregoing, it is apparent that a need 

exists to identify and isolate the P450 2C family member 
representing the principal determinant of S-mephenytoin 4 1 - 
hydroxylase activity in humans. There is also a need for 
stable cell lines expressing the S-mephenytoin 4 •-hydroxylase 

20 activity. A need is also apparent for methods of screening 
drugs for safety and efficacy in individuals deficient in S- 
mephenytoin 4 • -hydroxylase activity. There is also a need for 
methods for diagnosing individuals deficient in S-mephenytoin 
4 1 -hydroxylase activity. The present invention fulfills these 

25 and other needs. 



SUMMARY OF THE INVENTION 
The invention provides purified cytochrome P450 2C19 
polypeptides. The amino acid sequence of an exemplary P450 

30 2C19 polypeptide is designated SEQ. ID. No. 1. Other 

cytochrome P450 2C19 polypeptides usually comprises an amino 
acid sequence having at least 97% sequence identity with the 
exemplified sequence. Many of the 2C19 polypeptides of the 
invention exhibit stereospecif ic S-mephenytoin 4 1 -hydroxy lase 

35 activity. The activity is typically at least about 1 nmol 
mephenytoin per nmol of the purified polypeptide per minute. 
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The invention also provides purified cytochrome P450 
2C18 polypeptides. The amino acid sequences of exemplary 2C18 
polypeptides are designated SEQ. ID. Nos. 5 and 11. 

In another aspect of the invention, purified DKA 
segments encoding the P450 2C19 polypeptides described above 
are provided. Some DNA segments encode the exemplary P450 
2C19 having the amino acid sequenced designated SEQ. ID. 
No. 1. one such exemplary DNA segment is designated SEQ. ID. 
No. 2. Other DNA segments encode the P450 2C18 polypeptides 
described above. Exemplary DNA segments are designated SEQ; 
ID. Nos. 6 and 12. 

In a further aspect of the invention stable cell 
lines are provided. The cell lines comprise an exogenous DNA 
segment encoding a cytochrome P450 2C19 polypeptide having at 
least 97% sequence identity with the amino acid sequence 
designated SEQ. ID. No. 1. The DNA segment is capable of 
being expressed in the cell line. Cell lines preferably 
produce high levels of the P450 2C19 polypeptide such as 10- 
200 pmol of the polypeptide per mg of total microsomal 
protein. Preferred cell lines are eukaryotic, including yeast 
and insect cells. 

The invention also provides methods of producing a 
cytochrome P450 2C19 polypeptide. In these methods, a stable 
cell line, as described above, is cultured under conditions 
such that the DNA segment contained in the cell line is 
expressed. 

The invention also provides antibodies that 
specifically bind to a 2C19 polypeptide comprising the amino 
acid sequence designated SEQ. ID. No. 1. Preferred antibodies 
are incapable of binding to nonallelic forms of 2C 
polypeptides, such as 2C9. 

In another aspect, the invention provides methods of 
screening for a drug that is metabolized by S-mephenytoin 4 1 - 
hydroxylase activity. The drug is contacted with a cytochrome 
P450 2C19 polypeptide. A metabolic product resulting from an 
interaction between the polypeptide is detected. The presence 
of the product indicates that the drug is metabolized by the 
S-mephenytoin 4 1 -hydroxylase activity. The cytochrome P450 
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2C19 used in the methods may be substantially pure or may be a 
component of a lysate of a stable cell line. The cytochrome 
P450 2C19 polypeptide may also be a component of an intact 
stable cell line. Some methods further comprise the steps of 
5 contacting the drug with a liver extract comprising a mixture 

of cytochrome P450 polypeptides, and detecting a metabolic # 
product resulting from an interaction between the drug and the 
mixture of cytochrome P450 polypeptides. 

The invention also provides methods of identifying a 

10 mutagenic, carcinogenic or cytotoxic compound. In some 

methods, the compound is contacted with a stable cell line 
capable of expressing a 2C19 polypeptide, such as described 
above. Mutagenic, carcinogenic or cytotoxic effects of the 
compound on the cell line are assayed. In other methods, the 

15 compound is contacted with a cytochrome P450 2C19 polypeptide 
in a reaction mixture. A metabolic product is generated 
resulting from S-mephenytoin 4 • -hydroxylase activity on the 
compound. The metabolic product is assayed for mutagenic, 
carcinogenic or cytotoxic effects on a test cell line. The 

20 effects indicate that the compound is mutagenic, carcinogenic 
or cytotoxic. In some methods, the test cell line is added to 
the reaction mixture before, during or after the contacting 
step. The 2C19 polypeptide used in these methods can be 
substantially pure or a component of a lysate of a stable cell 

25 line. The 2C19 polypeptide can also be a component of an 
intact stable cell line. Salmonella typhimurixm is a 
preferred cell line. 

The invention also provides methods for testing the 
chemopreventive activity of an agent. A stable cell line 

30 capable of expressing a 2C19 polypeptide, such as described 
above, is contacted with an agent suspected of being 
chemopreventive in the presence of a carcinogen. The agent 
can be contacted with the cell line before addition of the 
carcinogen. Effects of the agent on the cell line that are * 

35 indicative of chemopreventive activity are monitored. 

The invention also provides methods for determining 
the metabolites activated by a carcinogenic or xenobiotic. A 
stable cell line capable of expressing a 2C19 polypeptide, 
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such as described above, is contacted with the suspected 
carcinogen or xenobiotic. Metabolites and/ or their effects 
are identified. 

The invention also provides methods of detecting a 
5 cytochrome 2C19 polypeptide in a tissue sample. The tissue 
sample is contacted with an antibody that specifically binds 
to the 2C19 polypeptide preferably without specifically 
binding to nonallelic variants such as 2C9. Specific binding 
between the antibody and the polypeptide is detected to 

10 indicate the presence of the polypeptide. 

In another aspect of the invention, methods of 
diagnosing a patient having a deficiency in S-mephenytoin 4'- 
hydroxylase activity are provided. In these methods, a sample 
of nucleic acids is obtained from the patient, and 

15 a cytochrome P450 2C19 DNA sequence from the nucleic acids in 
the sample is analyzed for the presence of a polymorphism 
indicative of the deficiency. The most frequently occurring 
polymorphisms in the P450 2C19 genes occur at nucleotides 681 
and 636 of the 2C19 gene. 

20 In some methods, the P450 2C19 DNA sequence subject 

to analysis is genomic. In such methods, an amplifying step 
is often primed from a forward primer sufficiently 
complementary with a first subsequence of the antisense strand 
of the 2C19 sequence to hybridize therewith, and a reverse 

25 primer sufficiently complementary to a second subsequence of 
the sense strand of the 2C19 sequence to hybridize therewith. 

Some methods detect a polymorphism at nucleotide 681 
of the coding region of the P450 2C19 DNA genomic sequence. 
This can be achieved by selecting a forward primer that 

30 hybridizes upstream from nucleotide 681 of the coding region, 
and a reverse primer that hybridizes downstream from 
nucleotide 681 of the coding region. Amplification products 
generated from these primers can be analyzed by digesting the 
amplified DNA segment with a restriction enzymes that 

35 recognizes a site that includes nucleotide 681 of the coding 
region. 

Other methods detect a polymorphism at nucleotide 
636 of the coding region of the P450 2C19 DNA genomic 
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sequence. This can be achieved using a forward primer that 
hybridizes upstream from nucleotide 636 of the coding region, 
and a reverse primer that hybridizes downstream of nucleotide 
636 of the coding region. Amplification products are 
5 conveniently analyzed by digestion with an enzyme that 

recognizes a site that includes nucleotide 636 of the coding 
region. 

Other methods detect the 681 polymorphism by a 
different approach involving selective amplification of the 

10 wildtype or mutant allele. For example, for selective 

amplification of the wildtype allele, a suitable forward 
primer has about 10-50 contiguous nucleotides from the 
wildtype 2C19 sequence shown in Fig. 16 including the 
nucleotide at position 681 of the coding region. The forward 

15 primer primes amplification from the complement of the 

wildtype 2C19 sequence without priming amplif ication from the 
complement of the mutant 2C19 sequence shown in Fig. 16. 
Preferably, the 3 1 nucleotide of the forward primer is the 
nucleotide at position 681. Analogously, the 681 mutant 

20 allele can be amplified using a forward primer having 

about 10-50 contiguous nucleotides from the mutant 2C19 
sequence shown in Fig. 16 including the nucleotide at position 
681 of the coding sequence. The forward primer primes 
amplification from the complement of the mutant 2C19 sequence 

25 without priming amplification from the complement of the 
wildtype 2C19 sequence shown in Fig 16. 

The invention also provides analogous methods for 
detection of the 636 polymorphism. 

In other methods, the segment of 2C19 DNA subject to 

30 analysis is a cDNA sequence. cONA is produced by reverse 

transcribing mHNA in the sample to produce the cDNA sequence. 
In some methods for detecting the 681 polymorphism, the 
forward primer comprises about 10-50 contiguous nucleotides 
upstream of nucleotide 643 of the coding region of the 

35 wildtype 2C19 cDNA sequence shown in Fig. 12 and hybridizes to 
the complement of the 2C19 sequence upstream from nucleotide 
643 of the coding region, and the reverse primer comprises 
about 10-50 contiguous nucleotides from the complement of the 
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wildtype 2C19 cDNA sequence shown in Fig. 12 and hybridizes to 
the 2C19 sequence downstream from nucleotide 682 of the coding 
region. In other methods, the forward primer hybridizes to 
the complement of the wildtype 2C19 cDNA sequence shown in ' 
5 Fig. 12 between nucleotides 643 and 682 without hybridizing to 
the complement of the mutant 2C19 cDNA sequence shown in 
Fig. 12. In other methods, the reverse primer hybridizes to 
the wildtype 2C19 cDNA sequence shown in Fig. 12 between 
nucleotides 643 and 682 without hybridizing to the mutant 2C19 

10 cDNA sequence shown in Fig. 12. 

The invention provides analogous methods for 
diagnosing the 636 polymorphism from cDNA. In some methods, 
the forward primer comprises about 10-50 contiguous 
nucleotides upstream of nucleotide 636 of the coding region of 

15 the wildtype 2C19 cDNA sequence shown in Fig. 12, and the 

reverse primer comprises about 10-50 contiguous nucleotides 
from the complement of the wildtype 2C19 cDNA sequence shown 
in Fig. 12 downstream from nucleotide 636 of the coding 
region. 

20 The invention also provides methods capable of 

detecting any polymorphism from cDNA. In these methods, the 
full-length 2C19 cDNA sequence is usually amplified. Analysis 
is often performed by sequencing a segment of the 2C19 cDNA 
amplification product. 

25 The invention provides further methods for 

diagnosing polymorphisms in genomic DNA. In these methods, 
genomic DNA is digested with a restriction enzyme that 
recognizes a site that includes nucleotide 636 or 681 of the 
coding region. The digestion products are then detected by 

30 Southern blotting with a labelled segment of the 2C19 DNA 
sequence as a probe. 

In another aspect of the invention, diagnostic kits 
are provided. Some diagnostic kits comprise forward and 
reverse primers. The forward primer is sufficiently 

35 complementary with a first subsequence of the antisense strand 
of a double-stranded 2C19 genomic DNA sequence to hybridize 
therewith, and the reverse primer sufficiently complementary 
with a second subsequence of the sense strand of the 2C19 
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genomic sequence to hybridize therewith. For example, in some 
methods for diagnosis of the 681 polymorphism, the first 
subsequence is upstream of nucleotide 681 of the coding 
region, and second subsequence is downstream of nucleotide 681 
5 of the coding region. Similarly, in some methods for 

diagnosis of the 636 polymorphism, the first subsequence is 
upstream of nucleotide 636 of the coding region, and the 
second subsequence is downstream of nucleotide 636 of the 
coding region. 

10 

BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 shows Western blots of human liver 
microsomal proteins. Microsomal proteins were separated by 
SDS-polyacrylamide gel electrophoresis. Blot A was performed 

15 using polyclonal antibody to 2C9 and blot B with anti-2C8 

(HLx) . Each lane represents 20 fig of microsomal protein from 
an individual liver. The 2C8 antibody also recognized 
purif ied rat P450 2C13 (g) . cDNA libraries were constructed 
from livers 860624 (low HLx) and S33 (high HLx) . 

20 Figure 2 contains nucleotide sequences of human P450 

2C cDNAs. 2c (SEQ. ID. No. 14) is indicated in the top line 
and represents the consensus sequence where information from 
more than one sequence is available. Sequences were 
determined by the dideoxy chain termination method. The 

25 differences observed for clones 25 (SEQ. ID. No. 4) and 65 

(SEQ. ID. No. 10) are underlined. The termination codons are 
starred. The heme binding region and polyadenylation signals 
are underlined. The one-base difference between 29c (SEQ. ID. 
No. 6) and 6b (SEQ. ID. No. 12) are also underlined. The 

30 termination codon is starred. The new allelic variant 

proteins of 2C18, referred to as 29c (SEQ. ID. No. 5) and 6b 
(SEQ. ID. No. 11), and the new protein of 2C19, referred to as • 
11a (SEQ. ID. No. 1), are compared with the protein of 2C8, 
referred to as 2C8 (SEQ. ID. No. 7), and the allelic variant * 

35 proteins of 2C9, referred to as 65 (SEQ. ID. No. 9) and 25 
(SEQ. ID. No. 3) . 

Figure 3 depicts a comparison of amino acid 
sequences of cytochrome P450 2C8 allelic variants. 
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Figure 4 depicts a Western blot of recombinant 
transformed COS-1 cells. Each lane represents microsomal 
protein (50 /ig) from an independent transf ormation with the 
indicated P450 2C cDNA, mock-transf ected cells (CON) , 20 /zg of 
human liver microsomal protein (liver S5) , or 2 pmol of pure 
P450g (2C13). 

Figure 5 shows a Northern blot of human mRNAs. Each 
lane represents 10 fig of mRNA, and the blot was probed with 
end- labeled MOOR, an oligoprobe specified for 2C8 (SEQ. ID. 
No. 8) (top), stripped, and reprobed with 32 P-actin cDNA 
(bottom) . 

Figure 6: Western blots of yeast microsomes 
expressing recombinant P450 2C cDNAs. C0N=control (yeast 
microsomes lacking recombinant proteins) . 

Figure 7: Linearity of S-mephenytoin 4 • -hydroxylase 
activity and amount of recombinant cytochrome P450 2C19. 

Figure 8: S-mephenytoin 4 " -hydroxylase activity as 
a function of the molar ratio of cytochrome b 5 to recombinant 
cytochrome P450. 

Figure 9: HPLC radiochromatograms of metabolites 
formed after incubation of labelled mephenytoin with P450 2C 
enzymes, human liver microsomes and yeast control. 

Figure 10: Comparison of liver content of 
cytochrome P450 2C enzymes with S-mephenytoin 4 1 -hydroxylase 
activity. The upper part of the figure shows Western blots of 
liver samples from 16 individuals. The lower part of the 
figure shows the S-mephenytoin 4 • -hydroxy lation activity and 
ratios of S/R mephenytoin 4 1 -hydroxy lase activity in each 
sample. 

Figure 11: Correlation between hepatic 2C19 content 
and S-mephenytoin hydroxylase activity based on the data shown 
in Figure 10. 

Figure 12: Sequence alignment of PCR products from 
normal and aberrantly spliced CYP2C19 cDNAS (SEQ. ID. Nos. 45 
and 47) , with the corresponding amino acid translations (SEQ. 
ID. Nos. 46 and 48) indicated above and below the nucleotide 
sequence. The new termination codon TAA in the aberrant cDNA 
is indicated by the word END and the asterisk. The PCR 
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primers are indicated by the horizontal arrows in the 
sequence. The aberrant CYP2C19 cDNA is missing 40 base pairs 
of the cDNA in poor metabolizers as indicated by the dotted 
line. 

5 Figure 13: A. Diagram of strategy to amplify 

CYP2C19 cDNA transcripts from human liver samples. The 
sequence for the PCR primers is indicated in Fig. 12. This 
strategy yielded a 284 bp band for the normal cDNA, a 244 bp 
band for the aberrant cDNA and both bands with cDNA from 

10 heterozygous individuals. The hatched area indicates the 40 
bp deleted in exon 5 of the aberrant cDNA. B. Relation 
between genotype as assessed by reverse transcription PCR (RT- 
PCR) of human liver mRNA, CYP2C19 protein estimated by 
immunoblotting, 5-^mephenytoin hydroxylation activity, and the 

15 ratio of metabolism of the R/S enantiomers. In vitro 

phenotype was based on high (E) , intermediate (I) or low (P) 
S-mephenytoin 4 • -hydroxylase activity. 

Figure 14: A. Diagram showing strategy used to 
genotype genomic DNA from human blood. B. Diagram of family 

20 of propositus 61 (arrow) showing the pedigree and the gel of 
Smal -digested PCR products. C. Analysis of genomic DNA from 
selected Caucasians subjects from United States or from 
Switzerland. The phenotype (EM, IM or PM) is indicated in the 
brackets above the gel. D. Analysis of genomic DNA from 

25 selected Oriental subjects. 

Figure 15: A, Partial sequence of the intron 
4/exon 5 junction of CYP2C19 in extensive and poor 
metabolizers (SEQ. ID. Nos. 49 and 50) . Intron sequences are 
shown in lower case and exon sequences in capitals. The 

30 nucleotides deleted in the aberrantly spliced cDNA are 

indicated in bold. The polymorphic Smal site is underlined in 
2C19 (wt) . The highly conserved AG residues at the intron/exon 
junction are shown in black boxes. The consensus sequence 
(11YNCAGG) (Y=pyrimidine, R=purine, N-any base) for the 3 

35 splice site is indicated underneath the normal and cryptic 

splice junctions. The branch point consensus sequence (CURAY) 
is placed underneath two putative branch points. B. 
Sequencing of PCR products of genomic DNA from three 
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individuals who were homozygous normal, heterozygous, and 
homozygous defective (based on their Smal restriction 
digests) . The polymorphic Smal restriction site is indicated 
by the bracket in the homozygous wt sequence. The G-»A base 
pair change corresponding to position 681 of the cDNA is also 
indicated. C. Schematic representation of splicing in 
CYP2C19 wt and in CYP2C19 m . The black box indicates the 40 bp 
that are deleted in exon 5 of poor metabolizers. 

Figure 16: Additional 2C19 genomic sequence 
flanking the 681 polymorphism. The wildtype (SEQ. ID. No. 51) 
and mutant (SEQ. ID. No. 61) sequences are identical except 
for the G/A transposition at nucleotide 681. Regions of 
sequence ambiguity are indicated in lower case (n=any 
nucleotide, k-G/T ambiguity, r=A/G ambiguity, m=A/C 
ambiguity) . 

Figure 17: Genomic DNA sequence flanking the 636 
polymorphism (also referred to as m2) . Wildtype and mutant 
sequences are designated SEQ. ID. Nos. 52 and 54 respectively. 
Intron sequences are indicated in lower case and exons in 
capital. Translated amino acids (SEQ. ID. No. 53) are 
indicated above the nucleotide sequence. The numbers 
underneath the sequences indicate the first (482) and last 
(642) nucleotides in exon 4. The two mutations found in exon 
4 are indicated in bold. The aberrant stop codon is indicated 
by the word "End." Exemplary primers for PCR amplification 
are underlined. 

Figure 18: Diagnosis of 636 mutation in 2C19. The 
position of the PCR primers is indicated by arrows at 79-55 
base pairs in intron 3 and 70-89 bp in intron 4. The size of 
the PCR products expected in the wild type gene (wt) and the 
size of the product in the 636 mutant allele are shown in the 
bottom lines. 

Figure 19: Simultaneous detection of the 636 and 
681 mutations. 

DEFINITIONS 

Abbreviations for the twenty naturally occurring 
amino acids follow conventional usage {Immunology - a 
Synthesis (E.S. Golub & D.R. Gren, eds., Sinauer Associates, 
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Sunderland, MA, 2nd ed., 1991) (hereby incorporated by 
reference for all purposes). Stereoisomers (e.g., D-amino 
acids) of the twenty conventional amino acids, unnatural amino 
acids such as a, or-disubstituted amino acids, N-alkyl amino 
5 acids, lactic acid, and other unconventional amino acids may 
also be suitable components for polypeptides of the present 
invention. Examples of unconventional amino acids include: 4- 
hydroxyproline , -y-carboxyglutamate , €-N,N, N-tr imethy llysine , 
e-N-acetyllysine , O-phosphoser ine , N-acetylserine , N- 

10 f ormylmethionine, 3-methylhistidine, 5-hydroxy lysine, co-N- 

methylarginine, and other similar amino acids and imino acids 
(e.g., 4-hydroxyproline) . In the polypeptide notation used 
herein, the left-hand direction is the amino terminal 
direction and the right-hand direction is the carboxy-terminal 

15 direction, in accordance with standard usage and convention. 
Similarly, unless specified otherwise, the lefthand end of 
single-stranded polynucleotide sequences is the 5' end; the 
lefthand direction of double-stranded polynucleotide sequences 
is referred to as the 5 1 direction. The direction of 5 1 to 3' 

20 addition of nascent RNA transcripts is referred to as the 

transcription direction; sequence regions on the DNA strand 
that are 5' to the 5 1 end of the RNA transcript are referred 
to as "upstream sequences 11 ; sequence regions on the DNA strand 
that are 3 1 to the 3 • end of the RNA transcript are referred 

25 to as "downstream sequences". 

The phrase "polynucleotide sequence" refers to a 
single or double-stranded polymer of deoxyribonucleotide or 
ribonucleotide bases read from the 5 1 to the 3' end. It 
includes self -replicating plasmids, infectious polymers of DNA 

30 or RNA and non-functional DNA or RNA. 

The following terms are used to describe the 
sequence relationships between two or more polynucleotides: 
"reference sequence", "comparison window", "sequence 
identity", "percentage of sequence identity", and "substantial # 

35 identity". A "reference sequence" is a defined sequence used 
as a basis for a sequence comparison; a reference sequence may 
be a subset of a larger sequence, for example, as a segment of 
a full-length cDNA or gene sequence given in a sequence 
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listing, such as a polynucleotide sequence shown in SEQ. ID. 
NO. 2 or may comprise a complete cDNA or gene sequence. 
Generally, a reference sequence is at least 20 nucleotides in 
length, frequently at least 25 nucleotides in length, and 
often at least 50 nucleotides in length. Since two 
polynucleotides may each (1) comprise a sequence (i.e., a 
portion of the complete polynucleotide sequence) that is 
similar between the two polynucleotides, and (2) may further 
comprise a sequence that is divergent between the two 
polynucleotides, sequence comparisons between two (or more) 
polynucleotides are typically performed by comparing sequences 
of the two polynucleotides over a "comparison window" to 
identify and compare local regions of sequence similarity. A 
"comparison window", as used herein, refers to a conceptual 
segment of at least 20 contiguous nucleotide positions wherein 
a polynucleotide sequence may be compared to a reference 
sequence of at least 20 contiguous nucleotides and wherein the 
portion of the polynucleotide sequence in the comparison 
window may comprise additions or deletions (i.e., gaps) of 20 
percent or less as compared to the reference sequence (which 
does not comprise additions or deletions) for optimal 
alignment of the two sequences. Optimal alignment of 
sequences for aligning a comparison window may be conducted by 
the local homology algorithm of Smith & Waterman, Appl. Math. 
2:482 (1981) , by the homology alignment algorithm of 
Needleman & Wunsch, jr. Mol. Biol. 48:443 (1970), by the search 
for similarity method of Pearson & Lipman, Proc. Natl. Acad. 
Sci. (USA) 85:2444 (1988), by computerized implementations of 
these algorithms (FASTDB (Intelligenetics) , BLAST (National 
Center for Biomedical Information) or GAP, BESTFIT, FASTA, and 
TFASTA (Wisconsin Genetics Software Package Release 7.0, 
Genetics Computer Group, 575 Science Dr., Madison, WI) ) , or by 
inspection, and the best alignment (i.e., resulting in the 
highest percentage of sequence similarity over the comparison 
window) generated by the various methods is selected. The 
term "sequence identity" means that two polynucleotide 
sequences are identical (i.e., on a nucleotide-by-nucleotide 
basis) over the window of comparison. The term "percentage of 
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sequence identity" (also sometimes referred to as "percentage 
homology") is calculated by comparing two optimally aligned 
sequences over the window of comparison, determining the 
number of positions at which the identical nucleic acid base 
5 {e.g.. A, T, c, G, U, or I) occurs in both sequences to yield 
the number of matched positions, dividing the number of 
matched positions by the total number of positions in the 
window of comparison (i.e., the window size), and multiplying 
the result by 100 to yield the percentage of sequence 

10 identity. The terms "substantial identity" as used herein 

denotes a characteristic of a polynucleotide sequence, wherein 
the polynucleotide comprises a sequence that has at least 85 
percent sequence identity, preferably at least 96 percent 
sequence identity, more usually at least97, 98 or 99 percent 

15 sequence identity as compared to a reference sequence over a 
comparison window of at least 20 nucleotide positions, 
frequently over a window of at least 25-50 nucleotides, 
wherein the percentage of sequence identity is calculated by 
comparing the reference sequence to the polynucleotide 

20 sequence which may include deletions or additions which total 
20 percent or less of the reference sequence over the window 
of comparison. The reference sequence may be a subset of a 
larger sequence, for example, as a segment of the full-length 
sequence of SEQ. ID. Nos. 2, 6 or 12. 

25 As applied to polypeptides, the term "substantial 

identity" (or "substantial homology") means that two peptide 
sequences, when optimally aligned, such as by the programs 
BLAZE (Intelligenetics) GAP or BESTFIT using default gap 
weights, share at least 85% sequence identity preferably at 

30 least 96 percent sequence identity, more preferably at least 
97, 98 or 99 percent sequence identity or more (e.g., 99*5 
percent sequence identity) . Preferably, residue positions 
which are not identical differ by conservative amino acid 
substitutions. Conservative amino acid substitutions refer to 

35 the interchangeability of residues having similar side chains. 
For example, a group of amino acids having aliphatic side 
chains is glycine, alanine, valine, leucine, and isoleucine; a 
group of amino acids having aliphatic-hydroxyl side chains is 
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serine and threonine; a group of amino acids having amide- 
containing side chains is asparagine and glutamine; a group of 
amino acids having aromatic side chains is phenylalanine, 
tyrosine, and tryptophan; a group of amino acids having basic 
5 side chains is lysine, arginine, and histidine; and a group of 
amino acids having sulfur-containing side chains is cysteine 
and methionine. Preferred conservative amino acids 
substitution groups are: valine-leucine-isoleucine, 
phenylalanine-tyrosine, lysine-arginine, alanine-valine, and 

10 asparagine-glutamine. 

The term "substantially pure" means an object 
species is the predominant species present (i.e., on a molar 
basis it is more abundant than any other individual species in 
the composition) , and preferably a substantially purified 

15 fraction is a composition wherein the object species comprises 
at least about 50 percent (on a molar basis) of all 
macromolecular species present. Generally, a substantially 
pure composition will comprise more than about 80 to 90 
percent of all macromolecular species present in the 

20 composition. Most preferably, the object species is purified 
to essential homogeneity (contaminant species cannot be 
detected in the composition by conventional detection methods) 
wherein the composition consists essentially of a single 
macromolecular species. 

25 The term "naturally-occurring" as used herein as 

applied to an object refers to the fact that an object can be 
found in nature. For example, a polypeptide or polynucleotide 
sequence that is present in an organism (including viruses) 
that can be isolated from a source in nature and which has not 

30 been intentionally modified by man in the laboratory is 
naturally-occurring . 

The term "epitope" includes any protein determinant 
capable of specific binding to an immunoglobulin or T-cell 
receptor. Epitopic determinants usually consist of chemically 

35 active surface groupings of molecules such as amino acids or 
sugar side chains and usually have specific three dimensional 
structural characteristics, as well as specific charge 
characteristics . 
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Specific binding exists when the dissociation 
constant for a dimeric complex is s 1 /iM, preferably s 100 nM 
and most preferably s 1 nM. 

The term "allelic variants" refers to gene sequences 
5 mapping to the same chromosomal location in different 

individual in a species but showing a small degree of sequence * 
divergence from each other. Typically, allelic variants 
encode polypeptides exhibiting at least 96% or 97% amino acid 
sequence identity with each other. 

10 The term "nonallelic variants" refers to gene 

sequences that show similar structural and/ or functional 
properties but map at different chromosomal locations in an 
individual. In the 2C family, nonallelic variants typically 
exhibit 70-96% amino acid sequence identity with each other. 

15 The term "cognate variants" refers to gene sequences 

that are evolutionarily and functionally related between 
humans and other species such as primates, porcines, bovines 
and rodents such as mice and rats. Thus, the cognate primate 
gene to a human 2C19 gene is the primate gene which encodes an 

20 expressed protein which has the greatest degree of sequence 

identity to the 2C19 protein and which exhibits an expression 
pattern similar to that of the 2C19 protein. 

Stringent conditions are sequence dependent and will 
be different in different circumstances . Generally, stringent 

25 conditions are selected to be about 5° C lower than the 

thermal melting point (Tm) for the specific sequence at a 
defined ionic strength and pH. The Tm is the temperature 
(under defined ionic strength and pH) at which 50% of the 
target sequence hybridizes to a perfectly matched probe. 

30 Typically, stringent conditions will be those in which the 
salt concentration is at least about 0.02 molar at pH 7 and 
the temperature is at least about 60 °C. As other factors may 
significantly affect the stringency of hybridization, 
including, among others, base composition and size of the « 

35 complementary strands, the presence of organic solvents and 

the extent of base mismatching, the combination of parameters 
is more important than the absolute measure of any one. 
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A polymorphism is a condition in which two or more 
different nucleotide sequences coexist in the same 
interbreeding population in a DNA sequence. 

The term "oligonucleotide" refers to a molecule 
5 comprised of two or more deoxyribonucleotides or 

ribonucleotides, such as primers, probes, nucleic acid 
fragments to be detected, and nucleic acid controls. The 
exact size of an oligonucleotide depends on many factors and 
the ultimate function or use of the oligonucleotide. 

10 Oligonucleotides can be prepared by any suitable method, 

including, for example, cloning and restriction of appropriate 
sequences and direct chemical synthesis by a method such as 
the phosphotriester method of Narang et al., Meth. Enzymol. 
68:90-99 (1979); the phosphodiester method of Brown et al., 

15 Meth. Enzymol. 68:109-151 (1979); the diethylphosphoramidite 
method of Beaucage et al., Tetrahedron Lett. 22:1859-1862 
(1981) ; and the solid support method of U.S. Patent No. 
4,458,066. 

A primer is an oligonucleotide, whether natural or 

20 synthetic, capable of acting as a point of initiation of DNA 
synthesis under conditions in which synthesis of a primer 
extension product complementary to a nucleic acid strand is 
induced, i.e., in the presence of four different nucleoside 
triphosphates and an agent for polymerization (i.e., DNA 

25 polymerase or reverse transcriptase) in an appropriate buffer 
and at a suitable temperature. 

"Probe" refers to an oligonucleotide which binds 
through complementary base pairing to a subsequence of a 
target nucleic acid. Probes will typically hybridize to 

30 target sequences lacking complete complementarity with the 

probe sequence on reducing the stringency of the hybridization 
conditions. The probes are preferably directly labelled as 
with isotopes or indirectly labelled such as with biotin to 
which a streptavidin complex may later bind. By assaying for 

35 the presence or absence of the probe, one can detect the 
presence or absence of the target. 

"Subsequence" refers to a sequence of nucleic acids 
that comprise a part of a longer sequence of nucleic acids. 
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The term "target region" refers to a region of a 
nucleic acid to be analyzed such as a polymorphic region. 

Hybridization refers to binding between an 
oligonucleotide and a target sequence via complementary base 
5 pairing to achieve the desired priming by PCR polymerases or 

detection of hybridization signal, and sometimes embraces * 
minor mismatches that can be accommodated by reducing the 
stringency of the hybridization conditions. 

10 DESCRIPTION OF THE SPECIFIC EMBODIMENTS 

The invention provides novel cytochrome P450 2C 
polypeptides, DNA fragments encoding these polypeptides and 
cell lines expressing the polypeptides. The invention also 
provides methods of using the novel polypeptides for, inter 

15 alia, identifying drugs metabolized by S-mephenytoin 4'- 
hydroxylase activity. 

I . Polypeptides 

In one embodiment, the invention provides novel 

20 cytochrome P450 2C polypeptides, designated 2C18 and 2C19. 

The 2C18 and 2C19 proteins are nonallelic with each other and 
with known 2C polypeptides. An exemplary 2C19 polypeptide has 
the amino acid sequence designated SEQ. ID. No. 1. The 
invention also provides allelic variants of the exemplified 

25 2C19 polypeptide, and natural and induced mutants of such 

variants. The invention provides human 2C19 polypeptides and 
cognate variants thereof. Typically, 2C19 variants exhibit at 
substantial sequence identity (e.g., at least 96% or 97% amino 
acid sequence identity) with the exemplified 2C19 polypeptide 

30 and cross-react with antibodies specific to this polypeptide* 
2C19 variants are usually encoded by nucleic acids that show 
substantial sequence identity (e.g., at least 96% or 97% 
sequence identity) with the nucleic acid encoding the 
exemplified 2C19 variant (SEQ. ID. No. 2) . 1 

35 Some 2C19 polypeptides, including the exemplified 

polypeptide, exhibit high levels of stereospecif ic S- 
mephenytoin 4 1 -hydroxylase activity. See Table IV. Indeed, 
it is highly probable that 2C19 represents the principal human . 
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determinant of this activity. Typically such 2C19 
polypeptides exhibit a stereospecif ic S-mephenytoin 4'- 
hydroxylase activity of about 0,5-100, 1-10 or about 4-6 nmol 
S-mephenytoin per nmol 2C19 polypeptide per minute. 
5 Frequently, the activity of 2C19 polypeptides is higher than 
of native human liver microsomes. The activity of such 
polypeptides for the R-enantiomer of mephenytoin is typically 
at least 10, 50 or 100-fold lower. 

Other 2C19 polypeptides may lack substantial 

10 stereospecific S-mephenytoin 4 ' -hydroxylase activity. Such 
polypeptides represent allelic variants of the exemplified 
2C19 polypeptide. These polypeptides sometimes exhibit low 
levels of mephenytoin 4 » -hydroxylase activity (i.e., less than 
about 0.5 or 0.2 nmol mephenytoin per nmol 2C19 polypeptide 

15 per minute) . This activity may, or may not be, 

stereospecific. Although the presence of a 2C19 polypeptide 
with low enzymic activity could account for the phenotype of a 
few individuals defective in S-mephenytoin 4 1 -hydroxylase 
activity, the phenotype in most such individuals results from 

20 a complete or substantial absence of 2C19 polypeptide. See, 
e.g. , Figure 10. 

The invention also provides 2C18 polypeptides. The 
amino acid sequences of two allelic variants of 2C18 are 
designated SEQ. ID. Nos. 5 and 11. Also provided are allelic 

25 variants of the exemplified 2C18 polypeptides, conjugated 

variants thereof, and natural and induced mutants of any of 
these. Typically, 2C18 variants exhibit substantial sequence 
identity (e.g., at least 96% or 97% amino acid sequence 
identity) with the exemplified 2C18 polypeptides and cross- 

30 react with antibodies specific to these polypeptides. 2C18 
variants are usually encoded by nucleic acids that show 
substantial sequence identity (e.gr. , at least 96% or 97% 
sequence identity) with the nucleic acid encoding the 
exemplified 2C18 variants (SEQ. ID. Nos. 6 and 12). 

35 2C18 polypeptides typically show low levels of 

mephenytoin 4 • -hydroxylase activity (0.01-0.2 nmol mephenytoin 
per nmol 2C18 polypeptide per min* For some 2C18 
polypeptides, the activity shows a small degree of 
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stereoselectivity (up to about five fold) , However, by 
contrast to the 2C19 polypeptides, such stereoselectivity as 
is shown by 2C18 polypeptides is in favor of the R enantiomer. 
Some variants of 2C18 show high levels of a distinct enzymic 
5 activity, namely, tolbutamide hydroxylase activity (e.g., 

about 50-200 pmol tolbutamide per nmol 2C18 polypeptide per 
min) . Conceivably, some variants of 2C18 exhibit novel 
enzymic or regulatory functions not shared by other 2C family 
members . 

10 Besides substantially full-length polypeptides, the 

present invention provides fragments of full-length 2C18 and 
2C19 polypeptides. Some such fragments share the enzymic 
activity of a full-length fragment. A segment of a full- 
length 2C18 or 2C19 polypeptide will ordinarily comprise at 

15 least 50 contiguous amino acids and more usually, 100, 200 or 
400 contiguous amino acids from one of the exemplified 
polypeptide sequences, designated SEQ. ID. Nos. 1, 5 and 11. 
Fragments of full-length 2C18 and 2C19 polypeptides are often 
terminated at one or both of their ends near (i.e., within 

20 about 5, 10 or 20 aa of) the boundaries of functional or 
structural domains. Fragments are useful for, inter alia, 
generating antibodies specific to a 2C19 or 2C18 polypeptide. 
Fragments consisting essentially of the hypervariable regions 
of these polypeptides are preferred immunoglobulins for 

25 generating antibodies specific to a particular allelic 
variant. 

II, Nucleic Acid Fragments 

In another aspect of the invention, nucleic acids 
30 fragments are provided. An exemplified cDNA sequence of a 
2C19 polypeptide is designated SEQ. ID. No. 2. Exemplified 
cDNA sequences encoding two variant 2C18 polypeptides are 
designated SEQ. ID. Nos. 6 and 12. The exemplified sequences 
include both translated regions and 3* and 5' flanking 
35 regions. The exemplified sequence data can be used to design 
probes for other DNA fragments encoding 2C18 or 2C19 
polypeptides, (or fragments thereof). These DNA fragments 
include human genomic clones, cDNAs and genomic clones from 
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other species, allelic variants, and natural and induced 
mutants of any of these. Specifically, all nucleic acid 
fragments encoding all 2C18 and 2C19 polypeptides disclosed in 
this application are provided. Genomic libraries of many 
5 species are commercially available (e.g. , Clontech, Palo Alto, 
CA) , or can be isolated de novo by conventional procedures. 
cDNA libraries are best prepared from liver extracts. 

The probes used for isolating clones typically 
comprise a sequence of about at least 15, 20 or 25 contiguous 

10 nucleotides (or their complement) of an exemplified DNA 

sequence (i.e., SEQ. ID. Nos. 2, 6 or 12). Preferably probes 
are selected from regions of the exemplified sequences that 
show a high degree of variation between different 2C 
nonallelic variants. Hypervariable regions are the nucleic 

15 acids encoding amino acids 181-210, 220-248, 283-296 and 461- 
479. Probes from these regions are likely to hybridize to 
allelic variants but not to nonallelic variants of the 
exemplified sequences voider stringent conditions. Allelic 
variants can be isolated by hybridization screening of plaque 

20 lifts (Benton & Davis, Science 196:180 (1978). Alternatively, 
cDNAs can be prepared from liver mRNA by polymerase chain 
reaction (PCR) methods. 5'- and 3'- specific primers for 2C19 
are designed based on the nucleotide sequence designated SEQ. 
ID. No. 2. See generally PCR Technology: Principles and 

25 Applications for DNA Amplification (ed. H.A. Erlich, Freeman 
Press, NY, NY, 1992); PCR Protocols: A Guide to Methods and 
Applications (eds. Innis, et al., Academic Press, San Diego, 
CA, 1990); Mattila et al,, tfucleic Acids Res. 19:4967 (1991); 
Eckert et al., PCR Methods and Applications 1:17 (1991); PCR 

30 (eds. McPherson et al., IRL Press, Oxford); and U.S. Patent 

4,683,202 (each of which is incorporated by reference for all 
purposes) . 

Nucleotide substitutions, deletions, and additions 
can be incorporated into the polynucleotides of the invention. 
35 Nucleotide sequence variation may result from degeneracy of 

the genetic code, from sequence polymorphisms of 2C18 and 2C19 
alleles, minor sequencing errors, or may be introduced by 
random mutagenesis of the encoding nucleic acids using 
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irradiation or exposure to EMS, or by changes engineered by 
site-specific mutagenesis or other techniques. See Sambrook 
et al., Molecular Cloning: A Laboratory Manual (C.S.H.P. 
Press, NY 2d ed. , 1989) (incorporated by reference for all 
purposes) . 

jtt. cell Lines 

In another embodiment of the invention, cell lines 
capable of expressing the nucleic acid segments described 
above are provided. Stable cell lines are preferred to cell 
lines conferring transient expression. Stable cell lines can 
be passaged at least fifty times without reduction in the 
level of 2C polypeptides expressed by the cell lines. 
Preferably, cell lines are capable of being cultured so as to 
express 2C polypeptides at high levels, usually at least 0.2, 
1, 10, 20, 50, 100, 200 or 500 pmol of 2C polypeptide per mg 
of microsomal protein. For example, the 2C19 expression level 
of many cell lines of the invention is typically about 0.2- 
10,000, 1-200, 7-100, 10-50 or 10-20 pmol 2C19 polypeptide per 
mg microsomal protein. An expression level of 10 pmol 2C19 
per mg microsomal protein means that 2C19 represents about 
0.06% of total cellular protein. For E. coli and insect cell 
lines, the recombinant P450 protein can comprise 5-10% of 
total cellular protein. Often, the stable cell lines of the 
invention express more than one P450 polypeptide. These cell 
lines express 2C18 and/or 2C19 together with other members of 
the 2C family, or other P450 cytochromes such as 1A1, 1A2, 
2A6, 3A3, 3A4, 2B6, 2B7, 2C9, 2D6, and/or 2E1. 

E. coli is one prokaryotic host useful for cloning 
the polynucleotides of the present invention. Other microbial 
hosts suitable for use include bacilli, such as Bacillus 
subtilus, and other enterobacteriaceae, such as Salmonella, 
Serratia, and various Pseudoroonas species. Expression vectors 
typically contain expression control sequences compatible with 
the host cell, e.g., an origin of replication, any of a 
variety of well-known promoters, such as the lactose promoter 
system, a tryptophan (trp) promoter system, a beta-lactamase 
promoter system, or a promoter system from phage lambda. 
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Vectors often also contain an operator sequence and/or a 
ribosome binding site. The control sequences are operably 
linked to a P450 DNA segment so as to ensure its 
expression. and control the expression thereof. 

Other microbes, such as fungi , particularly, yeast, 
are particularly useful for expression. Saccharojnyces is a 
preferred host, with suitable vectors having expression 
control sequences, such as promoters, including 3- 
phosphoglycerate kinase or other glycolytic enzymes, and an 
origin of replication, termination sequences and the like as 
desired. For example, the plasmid pAAH5 can be used. The 5 1 - 
noncoding sequence of the P450 2C cDNAs can be eliminated and 
six adenosines added by polymerase chain reaction (PCR) 
amplification to optimize expression in yeast cells. The 5 1 - 
and 3' -primers recommended for amplification of 2C18 are 5 1 - 
GCAAGCTTAAAAAATGGATCCAGCTGTGGCTCT-3 • (SEQ. ID. No. 15) and 5«- 
GCAAGCTTGCCAAACTATCTGCCCTTCT-3 1 (SEQ. ID. No. 16). This 
includes addition of a Hind III restriction site at both ends 
to allow insertion into the pAAH5 vector and six 6 adenosines 
at the 5' -end to optimize translation. The final 20 bases of 
each sequence is specific for 20 bases at the 5«-end of 2C18 
starting with the ATG for methionine and 20 bases of the 3 1 - 
noncoding region. The primers for 2C19 can be constructed 
similarly. The yeast strain used, Saccharomyces cerevisiae 
334, can be propagated non-selectively in YPD medium (1% yeast 
extract, 2% peptone, 2% dextrose (Hovland et al. (1989) Gene 
83, 57-64) and Leu+ transf ormants selected on synthetic 
minimal medium containing 0.67% nitrogen base (without amino 
acids), 0.5% ammonium sulfate, 2% dextrose and 20 /xg/ml L 
histidine (SD+His) . Plates are made by the addition of 2% 
agar. Yeast can be transformed by the lithium acetate method 
of Ito et al. (1983) J. Bacterid. 153, 163 and selected on 
SD+His for selection of transf ormants . Cells are then grown 
to mid-logarithmic phase (Oeda et al., DNA 4:203-210 (1985)) 
and microsomes containing recombinant protein can be prepared. 

Insect cells (e.g., SF9) with appropriate vectors, 
usually derived from baculovirus, are also suitable for 
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expressing 2C polypeptides. See Luckow, et al. Bio/Technology 
6:47-55 (1988) (incorporated by reference for all purposes) . 

Mammalian tissue cell culture can also be used to 
express and produce the polypeptides of the present invention 
5 (see Winnacker, From Genes to Clones (VCH Publishers, N.Y., 
N.Y., 1987). Suitable host cell lines include CHO cell lines 
(e.g., V79) (Dogram et al- (1990) Mol. Pharmacol. 37, 607- 
613), various COS cell lines, HeLa cells, myeloma cell lines 
and Jurkat cells, hepatoma cell lines (Hep G2) , and a 

10 lymphoblastoid cell line AHH-1 TK+/-. Crespi et al. (1991) . 
Carcinogenesis 12, 355-359. Expression vectors for these 
cells (e.g., pEBVHistK or pSV2) can include expression control 
sequences, such as an origin of replication, a promoter (e.g., 
a HSV tJc promoter or pgk (phosphoglycerate kinase promoter) , 

15 an enhancer (Queen et al., Immunol. Rev. 89:49 (1986)), and 
necessary processing information sites, such as ribosome 
binding sites, RNA splice sites, polyadenylation sites (e.g., 
an SV40 large T Ag poly A addition site) , and transcriptional 
terminator sequences. Preferred expression control sequences 

20 are promoters derived from immunoglobulin genes, SV40, 

adenovirus, bovine papillomavirus, and the like. Expression 
control sequences are operably linked to a DNA segment 
encoding a P450 polypeptide so as to ensure the polypeptide is 
expressed. 

25 The vectors containing the polynucleotide sequences 

of interest can be transferred into the host cell by well- 
known methods, which vary depending on the type of cellular 
host. For example, calcium chloride transfection is commonly 
utilized for prokaryotic cells, whereas calcium phosphate 

30 treatment or electroporation may be used for other cellular 
hosts. (See generally Sambrook et al., Molecular Cloning: A 
Laboratory Manual (Cold Spring Harbor Press, 2nd ed. , 1989) 
(incorporated by reference in its entirety for all purposes) . 

Once expressed, the polypeptides of the invention 

35 and their fragments can, if desired, be purified according to 
standard procedures of the art, including ammonium sulfate 
precipitation, affinity columns, column chromatography, gel 
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electrophoresis and the like (see generally Scopes, Protein 
Purification (Springer-Verlag, N.Y., 1982). 

IV. Antibodies 

The invention also provides antibodies that 
specifically bind to epitopes on the 2C18 and 2C19 
polypeptides of the invention. Some antibodies specifically 
bind to one member of the 2C family (e.g., 2C19) without 
binding to nonallelic forms. Some antibodies specifically 
bind to a single allelic form of a 2C member such as the 2C19 
polypeptide having the amino acid sequence designated SEQ. ID. 
No. l. Antibodies that specifically bind to a 2C19 
polypeptide without binding to a 2C9 polypeptide are 
particularly useful in view of the relatively high degree of 
sequence identity between these nonallelic variants. See 
Table II. The production of non-human monoclonal antibodies, 
e.gr. , murine, lagomorpha, equine is well known and can be 
accomplished by, for example, immunizing an animal with a 
preparation containing a 2C19 polypeptide or an immunogenic 
fragment thereof. Human antibodies can be prepared using 
phage-display technology. See, e.g., Dower et al., WO 
91/17271 and McCafferty et al., WO 92/01047 (each of which is 
incorporated by reference in its entirety for all purposes) . 
Humanized antibodies are prepared as described by Queen et 
al. , wo 90/07861. 

V. Methods of Use 

A, Identification of Drugs Unsuitable for 
Administration to Poor Metabolizers of S-Mephenvtoin 

The identification of a 2C19 polypeptide as the 
principal determinant of human S-mephenytoin 4 1 -hydroxylase 
activity facilitates methods of screening drugs that are 
metabolized by this enzyme. Such drugs likely lack efficacy 
and/ or show intolerable side effects in individuals having a 
defect in S-mephenytoin 4 1 -hydroxylase activity (low 
producers) . The substantial absence of this activity in low 
producers often results in an inability to detoxify such 
drugs, preventing their elimination from the body. 
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Substantial absence of S-mephenytoin 4 1 -hydroxylase activity 
can also prevent metabolic processing pf certain drugs to 
activated forms • Drugs suspected of being metabolized by S- 
mephenytoin 4 • -hydroxylase activity include, in addition to 
5 mephenytoin itself, omeprazole, proguanil, diazepam and 
certain barbiturates • 

Drugs are screened for metabolic processing by S- 
mephenytoin 4 1 -hydroxylase activity in a variety of assays. 
See Example 5. In brief, the drug under test is usually 

10 labelled with a radioisotope or otherwise. The drug is then 

contacted with a 2C19 polypeptide exhibiting S-mephenytoin 4'- 
hydroxylase activity (e.g. , the polypeptide designated SEQ. 
ID. No. 1) . The 2C19 polypeptide can be in purified form or 
can be a component of a lysate of one of the cell lines 

15 discussed in Section III. Often, the 2C19 polypeptide is peart 
of a microsomal fraction of a cell lysate. The 2C19 
polypeptide can also be a component of an intact cell as many 
drugs are taken up by such cells. Often, the reaction mixture 
is supplemented with one or more of the following reagents: 

20 dilauroylphosphatidylcholine, cytochrome P450 reductase, human 
cytochrome b5, and NADPH. (See Example 5, for concentrations 
of these reagents and a suitable buffer) . After an incubation 
period (e.g., 30 min) , the reaction is terminated, and 
centrifuged. The supernatant is analyzed for metabolic 

25 activity, e.g., by a spectrographic or chromatographic method. 
The assay is usually performed in parallel on a control 
reaction mixture without a 2C19 polypeptide. Metabolic 
activity is shown by a comparative analysis of supematants 
from the test and control reaction mixtures. For example, a 

30 shift in retention time of radiolabeled peaks between test 

and control under HPLC analysis indicates that the drug under 
test is metabolized by S-mephenytoin 4 ■ -hydroxylase activity. 
Often, the test is repeated using an extract from human liver 
in place of the 2C19 polypeptide. The appearance of a 

35 labelled metabolic peak from the reaction using 2C19 

recombinant organisms or 2C19 recombinant cell fractions 
having the same HPLC retention time, and a specific activity 
at least as high, as that observed for human liver microsomes 
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provides strong evidence that S-mephenytoin 4 -hydroxylase 
activity plays a major role in processing the drug. The test 
can also be repeated using other 2C members, such as 2C18, as 
controls, in place of 2C19. 

Drugs can also be screened for metabolic dependence 
on S-mephenytoin 4 • -hydroxylase activity in transgenic 
nonhuman animals . Some such animals have genomes comprising a 
2C19 transgene {e.g., SEQ. ID. No. 2) operably linked to 
control sequences so as to render the transgene capable of 
being expressed in the animals. Other transgenic animals have 
a genome containing homozygous null mutations of endogenous 
2C19 genes. Mice and other rodents are particular suitable 
for production of transgenic animals. Drugs are administered 
to transgenic animals in comparison with normal control 
animals and the effects from administration are monitored. 
Drugs eliciting different responses in the transgenic animals 
than the control animals likely require S-mephenytoin 4 1 - 
hydroxylase activity for detoxification and/or activation. 

Drugs identified by the above screening methods as 
being metabolized by S-mephenytoin 4 • -hydroxylase activity 
should generally not be administered to individuals known to 
be deficient in this enzyme, or should be administered at 
different dosages. Indeed, in the absence of data on an 
individual patient 1 s S-mephenytoin 4-hydroxylase phenotype, it 
is often undesirable to administer such drugs to any member of 
an ethnic group known to be at high risk for S-mephenytoin 4- 
hydroxylase deficiency (e.g., Orientals and possibly blacks). 
If it is essential to administer drugs identified by the above 
screening procedures to individuals known to be at risk of 
enzymic deficiency (e.g., no alternative drug is available), a 
treating physician is at least apprised of a need for vigilant 
monitoring of the patient 1 s response to the drug. In general, 
the identification of a new drug as a substrate for 2C19 would 
mitigate against further development of the drug. 
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B. Screening Compounds for Mutagenic, Cytotoxic or 
Carcinogenic Activity 

The invention provides methods of measuring the 
mutagenic, cytotoxic or carcinogenic potential of a compound. 
5 In some methods, mutagenic, cytotoxic or carcinogenic effects 
are assayed directly on a cell line harboring one or more 
recombinant cytochrome P450 enzymes. In these methods, a 
compound under test is added to the growth medium of a cell 
line expressing 2C19, and/ or 2C18 and/or other cytochrome 

10 P450s. Often, one or more of the reagents discussed in 
Section V(I) , supra, is also added. After a suitable 
incubation, mutagenic, cytotoxic or carcinogenic effects are 
assayed. Mutagenic effects are assayed, e.g., by detection of 
the appearance of drug-resistant mutant cell colonies 

15 (Thompson, Methods Enzymol., 58:308, 1979). For example, 

mutagenicity can be evaluated at the hgprt locus (Penman et 
al., (1987) Environ. Mol, Mutagenesis 10, 35-60). 
Cytotoxicity can be assayed from viability of the cell line 
harboring the P450 enzyme (s) . Carcinogenicity can be assessed 

20 by determining whether the cell line harboring the P450 
enzymes has acquired anchorage- independent growth or the 
capacity to induce tumors in a thymic nude mice. 

In other methods, a suspected compound is assayed in 
a selected test cell line rather than a cell line harboring 

25 P450 enzymes. In these methods, the compound under test is 
contacted with P450 2C19 and/or 2C18 and/or other P450 
enzymes. The P450 enzyme (s) can be provided in purified form, 
or as components of lysates or microsomal fractions of cells 
harboring the recombinant enzyme (s) . The P450 enzyme (s) can 

30 also be provided as components of intact cells. Usually, one 
or more of the reagents discussed in Section V(l) , supra, is 
also added. Optionally, the appearance of metabolic products 
from the suspected compound can be monitored by techniques 
such as thin layer chromatography or high performance liquid 

35 chromatography and the like. 

The metabolic products resulting from treatment of 
the suspected compound with P450 enzyme (s) are assayed for 
mutagenic, cytotoxic or carcinogenic activity in a test cell 
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line. The test cell line can be present during the metabolic 
activation of the mutagen or can be added after activation has 
occurred. Suitable test cell lines include a mutant strain of 
Salmonella typhimurium bacteria having auxotrophic histidine 
5 mutations (Ames et al., Mut. Res. 31:347-364 (1975). Other 
standard test cell lines include Chinese hamster ovary cells 
(Galloway et al., Environ . Mutagen . 7:1 (1985); Gulati et al., 
(Environ. Mol. Mutagenesis 13:133-193 (1989)) for analysis of 
chromosome aberration and sister chromatic exchange induction, 

10 and mouse lymphoma cell (Myhr et al., Prog. Mut. Res. 5:555-. 
568, (1985) ) . 

The use of defined P450 enzymes for activation of 
compounds in the present methods offers significant advantages 
over previous methods in which rat or human S9-supernatant 

15 liver fractions (containing an assortment of P450 enzymes) 

were used. The present methods are more reproducible and also 
provide information on the mechanisms by which mutagenesis, 
cytotoxicity and carcinogenicity are effected. 

20 C. Identification of Potential Chemopreventive 

Drugs 

The invention also provides methods for identifying 
drugs having chemopr event ive activity. These methods employ 
similar procedures to those discussed in paragraph (2) above 

25 except that the methods are performed using a known mutagenic, 
cytotoxic or carcinogenic agent, together with a suspected 
chemopr eventive agent. Mutagenic, cytotoxic or carcinogenic 
effects in the presence of the chemopreventive agents are 
compared with those in control experiments in which the 

30 chemopreventive agent is omitted. 

D. Screening for Potential Chemotherapeutic Drugs 
The invention provides analogous methods to those 
described in paragraph (2), supra, for screening 
35 chemotherapeutic agents. In some methods, chemotherapeutic 
activity is determined directly on a tumorigenic cell line 
expressing 2C19 and/or 2C18 and or other cytochrome P450 
enzymes. In other methods, chemotherapeutic activity is 
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determined on a tumorigenic test cell line. Chemotherapeutic 
activity is evidenced by reversion of the transformed 
phenotype of cells resulting in reduced 50bb agar growth or 
reduced tumor formation in nude mice. 

5 

E. Programmed Cell Death 

The invention provides analogous methods to those 
described in paragraph (2) , supra, for identifying agents that 
induce programmed cell death or apoptosis. Apoptosis may have 
10 an important impact on prevention of malignant transformation. 
Programmed cell death is assayed by DNA fragmentation or cell- 
surface antigen analysis. 

Hi Monitoring 3Cj3 ^j\d 2CJ.9 Polypeptides 

15 The invention provides methods of quant itating the 

amount of the specific protein in mammalian tissues by 
measuring the complex formed between the antibody and proteins 
in the tissue. For example, a biological sample is contacted 
with an antibody under conditions such that the antibody binds 
20 to specific proteins forming an antibody: protein complex which 
can be quantitatively detected. 

VI. Diagnosing 2C19 and 2C18 Polymorphisms 
Diagnostic Assays for Identifying Individuals Deficient in S- 

25 Mephenvtoin 4 * -Hydroxylase 

The invention provides a variety of assays for 
identifying individuals deficient in S-mephenytoin 4 1 - 
hydroxylase activity. Such individuals comprise about 3-5% of 
Caucasian populations and about 20% of Orientals and possibly 

30 blacks. Identification of individuals deficient in S- 

mephenytoin 4 1 -hydroxylase activity is important in selecting 
appropriate drugs for treatment of these individuals. 
Usually, drugs that are metabolized by S-mephenytoin 4 1 - 
hydroxylase should not be administered to these individuals. 

35 The assays diagnose mutations in cDNA or genomic DNA encoding 
2C19, which as discussed above, is the principal human 
determinant of S-mephenytoin 4 1 -hydroxylase activity. The 
cDNA assays are particularly useful for de novo localization 
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of a 2C19 mutation to a particular nucleotide or nucleotides. 
The genomic assays are particularly useful for large-scale 
screening of individuals for the presence of a mutation that 
has previously been localized* 

A. Amplification Technologies 

Many of the diagnostic assays rely on amplification 
of part or all of a DNA segment encoding a 2C19 polypeptide 
(e.gr. , a 2C19 gene). In a preferred embodiment, target 
segments encoding a 2C19 polypeptide are amplified by the 
polymerase chain reaction. The PCR process is described in 
e.g., U.S. Patent Nos. 4,683,195; 4,683,202; and 4,965,188; 
PCR Technology: Principles and Applications for DNA 
Amplification (ed. Erlich, Freeman Press, New York, NY, 1992) ; 
PCR Protocols: A Guide to Methods and Applications (eds. Innis 
et al., Academic Press, San Diego, CA (1990); Mattila et al. 
Nucleic Acids Res. 19:4967 (1991); Eckert & Kunkel PCR Methods 
and Applications 1:17 (1991); PCR (eds. McPherson et al., im- 
press, Oxford) (each of which is incorporated by reference in 
its entirety for all purposes). Reagents, apparatus and 
instructions for using the same are commercially available 
(e.g., from PECI) . Other amplification systems include, 
ligase chain reaction, QB RNA replicase and RAN-transcription- 
based amplification systems. 

To amplify a target nucleic acid sequence in a 
sample by PCR, the sequence must be accessible to the 
components of the amplification system. Accessibility can be 
achieved by isolating the nucleic acids from the sample. A 
variety of techniques for extracting nucleic acids from 
biological samples are known in the art. Alternatively, if 
the sample is fairly readily disruptable, the nucleic acid 
need not be purified prior to amplification by the PCR 
technique, i.e., if the sample is comprises cells, 
particularly peripheral blood lymphocytes or monocytes, lysis 
and dispersion of the intracellular components may be 
accomplished merely by suspending the cells in hypotonic 
buffer. See Han et al. Biochemistry 26:1617-1625 (1987). 
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For amplification of mRNA sequences, a first step is 
the synthesis of a DNA copy (cDNA) of the region to be 
amplified by reverse transcription. Reverse transcription is 
the polymerization of deoxynucleoside triphosphates to form 
5 primer extension products that are complementary to a 

ribonucleic acid template. The process is effected by reverse 
transcriptase, an enzyme that initiates synthesis at the 3 
end of the primer and proceeds toward the 5 '-end of the 
template until synthesis terminates. Examples of suitable 

10 polymerizing agents that convert the RNA target sequence into 
a complementary, copy-DNA (cDNA) sequence are avian 
myeloblastosis virus reverse transcriptase and Thermus 
thexmophilous DNA polymerase, a thermostable DNA polymerase 
with reverse transcriptase activity marketed by PECI. Reverse 

15 transcription can be carried out as a separate step, or in a 
homogeneous reverse transcription-polymerase chain reaction 
(RT-PCR) . Polymerizing agents suitable for synthesizing a 
complementary, copy-DNA (cDNA) sequence from the RNA template 
are reverse transcriptase (RT) , such as avian myeloblastosis 

20 virus RT, Moloney murine leukemia virus RT, or Thermus 
thermophilous (Tth) DNA polymerase, a thermostable DNA 
polymerase with reverse transcriptase activity marketed by 
PECI. 

The first step of each amplification cycle of the 
25 PCR involves the separation of the nucleic acid duplex formed 
by the primer extension. Strand separation is achieved by 
heating the reaction to a sufficiently high temperature for an 
sufficient time to cause' the denaturation of the duplex but 
not to cause an irreversible denaturation of the polymerase 
30 (see U.S. Patent No. 4,965,188). Typical heat denaturation 
involves temperatures ranging from about 80 °C to 105 °C for 
times ranging from seconds to minutes. Typically, any initial 
RNA template is also degraded during the denaturation step 
leaving only DNA template. Other means of strand separation, 
35 including physical, chemical, or enzymatic means, are also 
possible. 

Once the strands are separated, the next step 
involves hybridizing the separated strands with primers that 
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flank the target sequence- The primers are then extended to 
form complementary copies of the target strands. Template- 
dependent extension of primers in PCR is catalyzed by a 
polymerizing agent in the presence of adequate amounts of four 
5 deoxyribonucleotide triphosphates (typically dATP, dGTP, dCTP, 
and dTTP) in a reaction medium comprised of the appropriate 
salts, metal cations, and pH buffering system. Suitable 
polymerizing agents include, for example, E. coli DNA 
polymerase I or its Klenow fragment, T 4 DNA polymerase, Tth 

10 polymerase, and rag polymerase, a heat-stable DNA polymerase 
isolated from Thermus aguaticus commercially available from 
Perkin-Elmer Cetus Instruments (PECI, Norwalk, CT) . See U.S. 
Patent No. 4,889,818. See Gelfand, 1989 in PCR Technology, 
supra. The polymerizing agents initiate synthesis at the 3 

15 end of the primer and proceeds toward the 5 '-end of the 
template until synthesis terminates. 

The primers are designed so that the position at 
which each primer hybridizes along a duplex sequence is such 
that an extension product synthesized from one primer, when 

20 separated from the template (complement) , serves as a template 
for the extension of the other primer. The cycle of 
denaturation, hybridization, and extension is repeated as many 
times as necessary to obtain the desired amount of amplified 
nucleic acid. 

25 The primers are selected to be substantially 

complementary to the different strands of each specific 
sequence to be amplified. This means that the primers must be 
sufficiently complementary to hybridize with their respective 
strands. Therefore, the primer sequence need not reflect the 

30 exact sequence of the template. For example, a non- 
complementary nucleotide fragment may be attached to the 5" 
end of the primer with the remainder of the primer sequence 
being complementary to the strand. Alternatively, 
complementary bases or longer sequences can be interspersed 

35 into the primer, provided that the primer sequence has 

sufficient complementarity with the sequence of the strand to 
be amplified to hybridize therewith and thereby form a 
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template for synthesis of the extension product of the other 
primer. 

Paired primers for amplification of a given segment 
of DNA are designated forward and reverse primers. 
Conventionally, the orientation of a double-stranded DNA 
molecules is that of the sense (or coding strand) , with the 
5 •-terminus of the coding strand being drawn on the left (see, 
e.g., Fig. 15). Under this convention, the forward primer 
hybridizes to a double-stranded DNA molecule at a position 5' 
(or upstream) from the reverse primer. The forward primer 
hybridizes to the complement of the coding strand of the 
double stranded sequence (i.e., the antisense strand) and the 
reverse primer hybridizes to the coding strand. 

The appropriate length of a primer depends on the 
intended use of the primer but typically ranges from 10-100, 
15-50, 15-30, or more usually, 15 to 25 nucleotides. Shorter 
primers tend to lack specificity for a target nucleic acid 
sequence and generally require cooler temperatures to form 
sufficiently stable hybrid complexes with the template. 
Longer primers are expensive to produce and can sometime self- 
hybridize to form hairpin structures. 

The spacing of primers determines the length of 
segment to be amplified. The spacing is not usually critical 
and amplified segments can range in size from about 25 bp to 
at least 35 kbp. Segment from 25-2000, 50-1000, 100-500 bp or 
about 400 bp are typical. For larger segments, difficulties 
may occasionally be encountered in obtaining efficient and 
accurate amplification. For smaller segments, analysis of 
amplification products may be more difficult. 

The primer can be labelled, if desired, by 
incorporating a label detectable by spectroscopic, 
photochemical, biochemical, immunochemical, or chemical means. 
For example, useful labels include 32 P, fluorescent dyes, 
electron-dense reagents, enzymes (as commonly used in an 
ELISA) , biotin, or haptens and proteins for which antisera or 
monoclonal antibodies are available. A label can also be used 
to "capture" .the primer, so as to facilitate the 
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immobilization of either the primer or a primer extension 
product, such as amplified DNA, on a solid support. 

B. Tissue Sample for Analysis 

The diagnostic assays are performed on a tissue 
sample containing a nucleic acid encoding a 2C19 polypeptide. 
For assay of genomic DNA, virtually any tissue sample (other 
than pure red blood cells) is suitable. For example, 
convenient tissue samples include whole blood, buccal, skin 
and hair. For assay of cDNA, the tissue sample must be 
obtained from an organ in which a 2C19 gene is expressed, such 
as the liver. Liver samples from dead patients are suitable 
for de novo localization of mutations (see Section C, infra). 
However, for screening of living persons, liver biopsies, 
while feasible, are generally undesirable. Thus, for large- 
scale screening of living persons, analysis of genomic DNA is 
preferred. 

C. De Novo Localization of 2C19 Polymorphisms 

2C19 polymorphisms are identified and localized to 

specific nucleotides by comparison of nucleic acids from poor 
metabolizing individuals with nucleic acids from extensive 
metabolizers. The comparison can be initiated directly at the 
genomic level. If intron primers cure known, individual exons 
and intron/exon junctions of 2C19 can be amplified from 
genomic DNA. These fragments can be sequenced directly or 
analyzed by single-stranded conformational analysis to 
indicate the presence of a polymorphism and then analyzed by 
sequencing. 

Comparison is sometimes initiated at the cDNA level 
because of the shorter size of cDNA (about 1750 bp) relative 
to genomic DNA (about 55 kbp) . cDNA is amplified from liver 
samples of individuals known to have phenotypic S-mephenytoin 
metabolic deficiencies, and the cDNA sequence is compared with 
the wildtype sequence shown in SEQ. ID. No. 2. Often, the 
full-length cDNA is amplified. An initial comparison can be 
performed by single-stranded conformational analysis to 
indicate the existence of a polymorphism. The polymorphism is 
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then localized by sequence analysis indicating the site of 
mutations in cDNA. Of course, the amplification product can 
also be sequenced directly without prior conformational 
analysis. Having localized a mutation in cDNA, a 
5 corresponding region of genomic 2C19 DNA is amplified. The 
genomic DNA is usually amplified from primers spanning the 
mutation. At least one of the primers for this amplification 
usually comprises a subsequence of the cDNA sequence proximate 
(i.e., within 25-200 bp of the cDNA mutation). Primers can 

10 also comprise subsequences of genomic 2C19 DNA that have 
already been sequenced, subsequences from related genomic 
sequences, such as 2C18 or 2C9 (see de Mora is et al. , Biochem. 
Biophys. Res. Coiomun. 194:194-201 (1993)) (incorporated by 
reference in its entirety for all purposes) , or can be random. 

15 An amplified genomic fragment spanning the portion of the 
coding region in which the cDNA polymorphism occurs is 
sequenced and compared with the corresponding region from a 
2C19 sequence from an individual exhibiting extensive S- 
mephenytoin 4 1 -hydroxy lase metabolism to identify the locus of 

20 the genomic mutation. 

In some instances, there will be a simple 
relationship between genomic and cDNA mutations. That is, a 
single base change in a coding region of genomic DNA can give 
rise to a corresponding mutated codon in the cDNA. In other 

25 instances, the relationship between genomic and cDNA mutations 
is more complex. Thus, for example, a single base change in 
genomic DNA creating an aberrant splice site can give rise to 
deletion of a substantial segment of cDNA in a poor 
metabolizing individual. 

30 

D. The 681 and 636 Polymorphisms 

The principal mutation in individuals deficient in 
the S-mephenytoin 4 '-hydroxylase activity is designated the 
681 polymorphism. See Example 7. The 681 polymorphism 
35 results from a single-base mutation in genomic 2C19 DNA at 

nucleotide position 681 of the coding region. A nucleotide in 
a coding (i.e., exonic) region of genomic 2C19 DNA is 
designated the same number as the corresponding nucleotide in 
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the cDNA sequence shown in SEQ. ID, No. 2, when the genomic 
coding sequence is maximally aligned with the cDNA sequence. 
The 681 polymorphism results in a G/A transposition at 
nucleotide 681 of the coding region. Homozygous mutations at 
5 this position occur in about 70% of individuals having a low- 
producing (i.e., defective) S-mephenytoin 4 • -hydroxylase 
phenotype. The mutation is inherited in an autosomal 
recessive fashion. Thus, individuals heterozygous in this 
mutation usually exhibit normal (i.e., extensive S-mephenytoin 

10 activity) . Fortuitously, the mutation confers two distinct 
properties that facilitate its identification. In genomic 
DNA, the polymorphism results in loss of several restriction 
enzyme sites (e.g., Smal) and acquisition of other restriction 
sites (e.g., EcoRII) site in mutant individuals compared with 

15 wildtype individuals. These restriction sites include the 681 
nucleotide. In mRNA or cDNA, the 681 mutation results in a 
deletion of 40 bp spanning nucleotides 643-682 of the wildtype 
cDNA sequence shown in Fig. 12. The deletion is the 
consequence of an altered splice pattern stemming from the 

20 presence of the 681 polymorphism in genomic DNA. 

A second polymorphism is designated the 636 
polymorphism. See Example 8. The 636 polymorphism results 
from a single-base mutation in genomic 2C19 DNA at nucleotide 
position 636. The 636 polymorphism results in a G/A 

25 transposition thereby introducing a premature stop codon into 
2C19 mRNA. The mutation is easily be recognized by the loss of 
e.g., a BamHI site in both genomic and cDNA and acquisition of 
e.g., a Hinfl site. The mutation is inherited in an autosomal 
recessive fashion. Homozygous mutations at nucleotide 636 

30 account for about 10% of low-producing phenotypes in 

Orientals. Heterozygous individuals having one allele 
defective in the 636 polymorphism and the other allele 
defective in the 681 polymorphism account for all or nearly 
all of the remaining 15% of low producing Oriental 

35 individuals. Thus, the 681 and 636 polymorphisms collectively 
account for all, or nearly all, low producing phenotypes in 
Orientals • 
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In Caucasians, the 636 polymorphism is less 
prevalent and some low producing individuals probably have a 
mutation at a locus other than nucleotide 681 or 636 of the 
coding sequence. Conceivably, a few mutations might occur in 
5 other genes that exert regulatory control over the 2C19 gene. 
However, most, if not all, of the remaining mutations probably 
result from additional polymorphisms in the 2C19 gene. 

E. Screening Assays for Defined Mutations 
10 The invention provides assays that permit large- 

scale screening of individuals for the presence of defined 
mutations. Of course, detection of the 681 and 636 mutations, 
which account for all or nearly all deficiencies in Orientals 
and about 75% of deficiencies in Caucasians, is of primary 
15 importance. An assay on an individual under test is often 

performed in parallel with control assays on DNA samples from 
subjects of known phenotype (i.e., extensive or poor 
metabolizer of S-mephenytoin) . 

20 1. Genomic Assays 

Assays are preferably performed on a genomic 
substrate because of the ready availability of tissue samples 
containing genomic DNA. 

a. Amplification of Segments Spanning a 
Defined Mutation 

A preferred strategy for analysis entails 
amplification of a DNA sequence spanning previously localized 
polymorphism (s) (e.g. , the 681 and/or 636 polymorphisms). 
Amplification of such a sequence can be primed from forward 
and reverse primers that hybridize to a 2C19 gene on opposite 
sides of a mutation (e.g., the 681 mutation, but which do not 
hybridize to the mutated nucleotide itself) . That is, for 
detection of the 681 polymorphism, the forward primer 
hybridizes upstream or 5' to the 681 nucleotide and the 
reverse primer hybridizes downstream or 3' to this nucleotide. 
Similarly, for detection of the 636 polymorphism, the forward 
primer hybridizes upstream or 5' to the 636 nucleotide and the 
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reverse primer hybridizes downstream or 3' to this nucleotide. 
For simultaneous analysis of 636 and 681 polymorphisms, the 
forward primer hybridizes upstream or 5' to the 636 nucleotide 
and the reverse primer hybridizes downstream or 3' to 
5 nucleotide 681. 

The forward primer is sufficiently complementary to 
the antisense strand of a 2C19 DNA sequence to hybridize 
therewith and the reverse primer is sufficiently complementary 
to the sense strand of the 2C19 sequence to hybridize 

10 therewith. The primers usually comprise first and second 

subsequences from opposite strands of a double-stranded 2C19 
DNA sequence. Isolated points of mismatch between a primer 
and a corresponding 2C19 subsequence can usually be tolerated 
but are not preferred. It is particularly important to avoid 

15 mismatches in the two nucleotides at the 3 1 end of the primer 
(especially the terminal nucleotide) • 

Because allelic variants of 2C19 exhibit at least 
about 97% sequence identity to each other, it is not critical 
which variant is selected as a source of subsequences for 

20 incorporation into forward and reverse primers. For example, 
suitable subsequences can be obtained from the genomic 2C19 
sequence defined as wildtype in Figs. 15-17. Fig. 15 provides 
genomic sequence immediately flanking the 681 mutation, and 
Figure 16 provides more distal flanking sequences. Figure 17 

25 provides genomic sequence flanking the 636 mutation. These 
figures provide sufficient sequence for selection of a 
multitude of paired primers for amplification of a sequence 
spanning the 681 and/or 636 polymorphisms. Although there is 
no apparent advantage for doing so, additional genomic 

30 sequence flanking the regions already sequenced could easily 
be determined by PCR-based gene walking. See Parker et al. r 
Nucl. Acids Res. 19:3055-3060. A specific primer for the 
sequenced region is primed with a general primer that 
hybridizes to the flanking region. 

35 Forward primers often comprise about 10-50 and 

preferably 15-30 contiguous nucleotides from the wildtype 2C19 
sequences shown in Figs. 15-17 (which is the coding or sense 
sequence) . Reverse primers often comprise about 10-50 or 15- 
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30 nucleotides from the complement of the wildtype 2C19 
sequence shown in Figs. 15-17. The complement of the sequence 
shown in Figs. 15-17 is also referred to as the antisense 
sequences. A primer (or its complement) preferably exhibits 
5 100% sequence identity with a corresponding 2C19 subsequence 
to which it hybridizes over a window of about 15-30 bp. For 
amplification of the 681 polymorphism, forward primers 
preferably comprise a segment of contiguous nucleotides from 
the fourth intronic region and reverse primers a segment of 

10 contiguous nucleotides from the fifth exonic or intronic 

region. For amplification of the €36 polymorphism , forward 
primers preferably comprise a segment of contiguous 
nucleotides from the third intronic region and reverse primers 
a segment of contiguous nucleotides from the fourth intronic 

15 region. For amplification of both the 636 and 681 

polymorphisms, forward primers preferably comprise a segment 
of contiguous nucleotides from the third intronic region and 
reverse primers a segment of contiguous nucleotides from the 
fifth exonic region or fifth intronic region. See Figure 19. 

20 As noted above, the spacing of the subsequences is not 
critical, but a separation of about 50-2000 bp. For 
simultaneous amplification of the 636 and 681 mutations, the 
spacing is typically 1000-1500 bp. For amplification of 
either mutation alone, a spacing of about 400 bp is typical. 

25 Preferred primers exhibit perfect sequence identity 

to 2C19 and lesser sequence identity to corresponding regions 
of related genes, such as 2C9 and 2C18. Such primers are 
designed by comparison of the wildtype 2C19 sequence shown in 
Fig. 15-17 with corresponding sequences from 2C9 and 2C18 

30 described by de Morais et al. , supra. In general, sequence 
divergence between the three genes is expected to be greater 
in intronic sequences. An exemplary pair of primers for 
amplifying a segment spanning the 681 mutation is described in 
Example 7. A forward primer, 5 '-AATTACAACCAGAGCTTGGC-3 ' (SEQ. 

35 ID. No. 55) , exhibits perfect sequence identity to a 

subsequence from the wildtype 2C19 sense strand within 
intron 4. A reverse primer 5 '-TATCACTTTCCATAAAAGCAAG-3 ' 
((SEQ. ID. No. 56) exhibits perfect sequence identity to the 
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antisense strand of the wildtype 2C19 sequence within exon 5. 
The amplification product from these primers has a length of 
169 bp. An exemplary pair of primers for amplifying a segment 
spanning the 636 mutation is described in Example 8. A 
5 forward primer, 5 '-TATTATCTGTTAACTAATATGA-3 ' (SEQ. ID. No. 57) 
exhibits perfect sequence identity to a subsequence from the 
wildtype 2C19 sense strand within intron 3. A reverse primer 
5 '-ACTTCAGGGCTTGGTCAATA-3 ' (SEQ. ID. No. 58) exhibits perfect 
sequence identity to the antisense strand of the wildtype 2C19 

10 sequence within intron 4. The amplification product from 
these primers has a length of 329 bp. 

Having amplified a segment of a 2C19 gene known to 
span a polymorphism, a variety of assays are available for 
determining whether a mutation is present in an individual 

15 under test. A generally applicable, but relatively laborious 
assay, is to sequence the amplified fragment across the 
polymorphic locus and compare the resulting sequence with the 
wildtype 2C19 sequence shown in Fig. 15-17. 

A simpler assay, but one applicable to only certain 

20 mutations, is to compare the size or restriction profile of 
the amplified segment, optionally in comparison with a 
corresponding wildtype 2C19 segment. For the 681 
polymorphism, restriction analysis provides a rapid and clear- 
cut means of identifying a mutant allele. The 681 

25 polymorphism results in loss of a Smal site and acquisition of 
an EcoRII site in mutant alleles. Thus, Smal digestion of a 
wildtype allele produces an extra band compared with a mutant 
allele. For the amplification product obtained using the 
exemplified primers discussed above, Smal digestion of the 

30 wildtype product yields fragments of 120 and 49 bp, whereas 
the mutant amplification product remains uncut yielding a 
single fragment of 169 bp. In individuals homozygous for the 
wildtype allele, only the 120 bp and 49 bp bands are present. 
In individuals homozygous for the mutant allele, only the 169 

35 bp band is present. In heterozygotes , all three bands (i.e., 
169, 120 and 49 bp) are present. The bands can usually be 
detected by agarose or acrylamide gel electrophoresis and 
ethidium bromide staining. If greater sensitivity is needed, 
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the amplification product is labelled and the bands detected 
by, e.g., autoradiography. Of course, the assay can also be 
performed using an isoschizomer of Smal with identical 
results. The assay can also be performed by digesting with 
5 EcoRII or an isoschizomer thereof. In this case, one obtains 
a mirror image of the results obtained for Smal digestion, 
because the mutant 2C19 allele contains an additional EcoRII 
site relative to the wildtype allele. As a quality control 
measure, both Smal and EcoRII digestions can be performed on 

10 separate aliquots of a test sample. Of course, any other 
enzyme that recognizes a site that includes the 681 
polymorphism can also be used. For example, alternatives to 
Smal (i.e., that cleave only the wildtype allele) include 
Aval, Mspl, Neil, ScrFI and TspEI) . 

15 The 636 polymorphism can be similarly analyzed by 

digestion with e.gr. , BamHI. BamHI digestion of a wildtype 
allele produces an extra band compared with a mutant allele. 
For the amplification product obtained using the exemplified 
primers discussed above, BamHI digestion of the wildtype 

20 product yields fragments of 233 and 96 bp, and digestion of 
the mutant product yields a single fragment of 329 bp. In 
individuals homozygous for the wildtype allele, only the 233 
bp and 96 bp bands are present. In individuals homozygous for 
the mutant allele, only the 329 band is present. In 

25 heterozygotes, all three bands are present. Of course, other 
enzymes that cut the wildtype allele at the polymorphic locus 
but not the 636 mutant allele, or vice versa, can also be 
used. For example, alternatives to BamHI include Alwl, BsaJI, 
BstVI, Dpnl, EcoRII, NlalV, Sau3AI and ScrFI. Enzymes that 

30 recognize a site on the mutant allele including nucleotide 

636, but do not recognize the wildtype allele, include Hinfl 
and Tfil. 

For simultaneous detection of the 681 and 636 
polymorphisms after amplification of a fragment spanning both 
35 polymorphism, the DNA can be double digested with two of the 
enzymes mentioned above. One enzyme should distinguish 
between the mutant 681 allele from a wildtype allele and the 
other should distinguish the mutant 636 allele from a wildtype 
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allele. For example, double digestion with Smal and BamHI is 
suitable. The double digestion generates six different 
restriction patterns corresponding to the six possible 
genotypes: wt/wt, wt/681, wt/636, 681/681, 636/636 and 
5 681/636. See Figure 19. 

In another assay, amplification products are 
subjected to single-stranded conformational analysis. See, 
e.g., Hayashi, PCR Methods & Applications l, 34-38 (1991); 
Orita, Proc. Natl. Acad. Sci. USA 86, 2766-2270 (1989); Orita 

10 et al., Genomics 5, 874-879 (1989). This method is capable of 
detecting many single base mutations in DNA fragments up to 
200 bp irrespective whether the mutation causes a change in 
restriction fragment profile. In this method, the PCR 
reaction is performed using at least one labelled nucleotide 

15 or labelled primer to obtain a labelled amplified fragment. 
The amplification product is then denatured and the strands 
resolved by polyacrylamide gel electrophoresis under 
nondenaturing conditions. Mutations are detected by altered 
mobility of separated single strands. 

20 

fe* Selective Amplification of an Allelic 

variant; 

An alternative method for detecting defined 
mutations in a 2C19 gene employs a selective strategy whereby 

25 a wildtype allele is amplified without amplification of a 
mutant allele (or vice versa) • This is accomplished by 
designing one of the primers to hybridize to a subsequence 
overlapping a defined polymorphism (for example, the 681 
polymorphism) . Such a primer can be designed to hybridize to 

30 one polymorphic allele without hybridizing to the other. 
Thus, when such a primer is paired with a second primer 
hybridizing distal to the polymorphic region, amplification 
will only occur for one polymorphic allele. 

For diagnosis of the 681 polymorphism, selective 

35 amplification of the wildtype allele of 2C19 can be 

accomplished using a forward primer that has about 10-50, and 
usually 15-30 nucleotides from the wildtype 2C19 sequence 
shown in Fig. 15 or 16, including nucleotide 681. Such a 
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forward primer when paired with any suitable reverse primer 
downstream from nucleotide 681 (i.e., sufficiently 
complementary to the sense strand of 2C19 to hybridize 
therewith) can be used to amplify selectively the wildtype 
5 allele without amplifying a mutant allele. The selectivity 
between amplification of wildtype and mutant alleles is 
greatest when the 681 nucleotide occurs near, or preferably, 
at the 3' end of the primer. Because the extension forms from 
the 3' end of the primer, a mismatch at or near this position 

10 is most inhibitory of amplification. The same result can be 
achieved by using a reverse primer that has about 10-50 or 
usually 15-30 contiguous nucleotides from the complement of 
the wildtype 2C19 sequence shown in Fig. 15 or 16 (i.e., the 
antisense strand) including the nucleotide at position 681. 

15 Such a reverse primer can be paired with any suitable forward 
primer sufficiently complementary to a subsequence of the 
antisense strand of the 2C19 gene upstream from nucleotide 681 
to hybridize therewith. The 681 nucleotide should again be at 
or near the 3 V end of the reverse primer. 

20 Selective amplification of a 681 mutant allele is 

accomplished by an analogous strategy in which primers are 
designed to hybridize to the mutant allele without hybridizing 
to the wildtype. A suitable forward primer for amplification 
comprises about 10*50 or usually 15-30 contiguous nucleotides 

25 from the mutant 2C19 sequence shown in Pig. 15 of 16 (i.e., 

the sense strand) . The forward primer can be paired with any 
suitable reverse primer sufficiently complementary to the 
sense strand of a downstream 2C19 subsequence to hybridize 
therewith. Alternatively, the same result can be achieved 

30 using a reverse primer comprising about 10-50 or 15-30 

contiguous nucleotides from the complement of the mutant 2C19 
sequence shown in Fig. 15 or 16 (i.e., the antisense strand). 
Such a reverse primer can be paired with any suitable forward 
primer sufficiently complementary to the antisense strand of 

35 an upstream 2C19 subsequence to hybridize therewith. 

For diagnosis of the 636 polymorphism, selective 
amplification of the wildtype allele of the 2C19 allele can be 
accomplished using a forward primer that has about 10-50, and 



WO 95/30766 



PCT7US95/05744 



47 

usually 15-3 0 nucleotides from the wildtype 2C19 genomic 
sequence shown in Fig. 17, including nucleotide 636. Such a 
forward primer when paired with any suitable reverse primer 
downstream from nucleotide 636 (i.e., sufficiently 
5 complementary to the sense strand of 2C19 to hybridize 

therewith) can be used to amplify selectively the wildtype 
allele without amplifying a mutant allele. The 636 nucleotide 
usually occurs near, or preferably, at the 3 1 end of the 
primer. The same result can be achieved by using a reverse 

10 primer that has about 10-50 or usually 15-30 contiguous 

nucleotides from the complement of the wildtype 2C19 genomic 
sequence shown in Fig. 17 (i.e., the antisense strand) 
including the nucleotide at position 636. Such a reverse 
primer can be paired with any suitable forward primer 

15 sufficiently complementary to a sequence of the antisense 
strand of the 2C19 gene upstream from nucleotide 636 to 
hybridize therewith. The 636 nucleotide should again be at or 
near the 3' end of the reverse primer. 

For selective amplification of a 636 mutant allele a 

20 suitable forward primer for amplification comprises about 10- 
50 or usually 15-30 contiguous nucleotides including 
nucleotide 636 from the mutant 2C19 genomic sequence shown in 
Fig. 17 (i.e., the sense strand). The forward primer can be 
paired with any suitable reverse primer sufficiently 

25 complementary to the sense strand of a 2C19 genomic 

subsequence downstream from nucleotide 636 to hybridize 
therewith. Alternatively, the same result can be achieved 
using a reverse primer comprising about 10-50 or 15-30 
contiguous nucleotides including nucleotide 636 from the 

30 complement of the mutant 2C19 sequence shown in Fig. 17 (i.e., 
the antisense strand) . Such a reverse primer can be paired 
with any suitable forward primer sufficiently complementary to 
the antisense strand of a 2 CI 9 subsequence upstream from 
nucleotide 636 to hybridize therewith. 

35 Following amplification, the sample under test is 

characterized as wildtype or mutant by the presence or absence 
of an amplification product, with a primer designed for 
selective amplification of the wildtype allele, the presence 
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of an amplification product is indicative of that allele and 
the absence of an amplification product indicative of a mutant 
allele. The converse applies for primers designed for 
selective amplification of a mutant allele. In preferred 
5 assay, a sample is divided into two aliquots, one of which is 
amplified using primers for wildtype allele amplification, the 
other of which is amplified using primers appropriate for 
mutant allele amplification. The presence of an amplification 
product in one but not both of the aliquots indicates that the 

10 individual under test is either wildtype or a homozygous for 
the mutation (depending on aliquot in which the amplification 
product occurred) . The presence of amplification product in 
both aliquots indicates that the individual is heterozygous. 
The absence of an amplification product in both aliquots would 

15 indicate either the absence of a 2C19 gene or a quality 

control problem in the amplification procedure requiring that 
the assay be repeated. Coamplif ication of a second known 
standard human gene using a second set of primers can aid in 
distinguishing between these possibilities. If both bands are 

20 missing, the problem is probably quality control, while 

amplification of only the standard gene is suggestive that the 
CYP2C19 gene may be deleted. 

The presence or absence of amplification products 
can be detected by gel electrophoresis. Gels are usually 

25 visualized by ethidium bromide staining. However, if greater 
sensitivity is required fragments can be labelled in the 
course of amplification. Amplified fragments can be 
electrophoresed directly or can be cut with any restriction 
enzyme that releases fragments of a convenient size from the 

30 amplification products. For the simultaneous analysis of 

multiple samples, the dot-blot method may be advantageous. In 
the dot blot method, multiple unlabelled amplification 
mixtures are bound to discrete locations on a solid support, 
such as a membrane. The membrane is incubated with labeled 

35 probe under suitable hybridization conditions, the 

unhybridized probe removed by washing, and the filter 
monitored for the presence of bound probe. 
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c. Southern Blotting 

For polymorphic mutations resulting in loss or 
acquisition of a restriction site (such as the 681 and 636 
polymorphisms) , samples of genomic DNA can also be analyzed by 
5 Southern blotting without the need for prior amplif ication. 

The DNA is digested with an enzyme that cuts a wildtype allele 
but not a mutant allele or vice versa (e.g. , BamHI, Smal, 
EcoRll or Hinfl, or isoschizomers of any of these). For 
analysis of the 681 polymorphism, digestion with Smal or 

10 isoschizomers results in an additional fragment from the 

wildtype allele compared with the mutant allele. Digestion 
with EcoRll or isoschizomers results in an additional fragment 
from the mutant allele. Digestion products are detected with 
a 2C19 probe. For analysis of the 636 polymorphism, digestion 

15 with BamHI or isoschizomers results in an additional fragment 
from the wildtype allele compared with the mutant allele. 
Digestion with Hinfl results in an additional fragment from 
the mutant allele. The probe can be any segment of a 2C19 DNA 
sequence that includes the polymorphism and extends for at 

20 least about 20 nucleotides on either side. 

2. cDNA Assays 

Defined polymorphisms can also be detected by 
analysis of cDNA by similar strategies to those employed for 
25 genomic DNA. However, the primers appropriate for 

amplification procedures are not necessarily interchangeable 
for the two substrates. Suitable primers for analysis of the 
681 and 636 polymorphisms in cDNA are described below. 

30 a. Amplification of Segments Spanning a 

Defined Mutation 

The 681 polymorphism in genomic DNA results in 
a 40 bp deletion of cDNA comprising nucleotides 643-682 of the 
wildtype 2C19 cDNA or genomic sequence shown in Fig. 12. The 

35 forward primer and reverse primers are therefore designed to 
hybridize to 2C19 subsequences on opposite sides of this 
deletion. Thus, for example, a forward primer can hybridize 
to the antisense strand of a 2C19 sequence upstream from 
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nucleotide 643 of the coding region. Such a forward primer 
should be paired with a reverse primer that hybridizes to the 
sense strand of the 2C19 sequence downstream from nucleotide 
682. Nucleotides in a 2C19 DNA sequence are designated the 
5 numbers of corresponding nucleotides in the wildtype cDNA 
sequence shown in SEQ. ID. No. 2 (or Fig. 12, which shows a 
subsequence of SEQ. ID. No. 2), when the sequences are 
maximally aligned. Preferably, the forward primer comprises 
about 10-50 or 15-30 contiguous nucleotides upstream of 

10 nucleotide 645 from the wildtype 2C19 cDNA sequence shown in 
Fig. 12 or SEQ. ID. No. 2. Analogously, the reverse primer 
preferably comprises about 10-50 or 15-30 contiguous 
nucleotides from the complement of the wildtype 2C19 cDNA 
sequence shown in Fig. 12 or SEQ. ID. No. 2 downstream from 

15 nucleotide 682 of the coding region. For example, a forward 
primer comprising 5 1 -ATTGAATGAAAACATCAGGATTG-3 ' (SEQ. ID. 
No. 59) and a reverse primer comprising 5 9 - 

GTAAGTCAGCTGCAGTGATTA-3 • (SEQ. ID. No. 60) form a suitable 
pair. The amplification product from such primers is 40 bp 

20 longer for the wildtype 2C19 cDNA sequence than for the 681 
mutant sequence. 

For detection of the 636 polymorphism, the forward 
primer and reverse primers are designed to hybridize to 2C19 
subsequences on opposite sides of nucleotide 636. Thus, for 

25 example, a forward primer can hybridize to the antisense 

strand of a 2C19 sequence upstream from nucleotide 636 of the 
coding region. Such a forward primer should be paired with a 
reverse primer that hybridizes to the sense strand of the 2C19 
sequence downstream from nucleotide 636 (SEQ. ID. No. 2 or 

30 Fig. 12). Preferably, the forward primer comprises about 10- 
50 or 15-30 contiguous nucleotides upstream of nucleotide 636 
from the wildtype 2C19 cDNA sequence shown in Fig. 12 or SEQ. 
ID. No. 2. Analogously, the reverse primer preferably 
comprises about 10-50 or 15-30 contiguous nucleotides from the 

35 complement of the wildtype 2C19 cDNA sequence shown in Fig. 12 
or SEQ. ID. No. 2 downstream from nucleotide 636 of the coding 
region. 
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For simultaneous detection of the 636 and 681 
polymorphisms, the forward primer should be as described for 
detection of the 636 polymorphism and the reverse primer as 
described for detection of the 681 polymorphism. These 
5 primers will amplify a segment of DNA spanning both the 636 
and 681 polymorphisms. 

Amplification products are usually analyzed by gel 
electrophoresis. The products can be analyzed uncut or can be 
cleaved with any restriction enzyme having a site in the 

10 amplification product. For detection of the 681 polymorphism, 
Smal and its isoschizomers are particularly useful because of 
the presence of a restriction site present in wildtype 2C19 
DNA that is not present in the mutant form. See Fig. 12. 
Similarly, BamHI and its isoschizomers are particularly useful 

15 for detection of the 636 polymorphism. Analysis of fragments 
allows distinction between wildtype, homozygous and 
heterozygous mutations as discussed for the corresponding 
genomic assay. 

20 b. Selective Amplification of an Allelic 

Variant 

For analysis of the 681 polymorphism, selective 
amplification of the wildtype variant is achieved by selecting 
a forward or reverse primer that overlaps nucleotides 643-682 

25 of the wildtype 2C19 cDNA sequence (Fig. 12). This segment of 
nucleotides is not present in a mutant allele. Thus, a primer 
hybridizing to this segment of the wildtype allele will not 
hybridize to the mutant allele. Accordingly, such primers can 
be used to prime amplification of the wildtype allele without 

30 priming amplification of the mutant allele. For example, a 
forward primer that hybridizes to the complement of the 
wildtype 2C19 cDNA sequence shown in Fig. 12 between 
nucleotides 643-682 without hybridizing to the complement of 
the mutant 2C19 DNA sequence shown in Fig. 12 is suitable. 

35 Such a forward primer can be paired with any suitable reverse 
primer sufficiently complementary with a downstream 
subsequence of the sense strand of the 2C19 cDNA to hybridize 
therewith. 
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Alternatively, a reverse primer is designed that 
hybridizes to the wildtype 2C19 cDNA sequence shown in Fig. 12 
between nucleotides 643 and 682 without hybridizing to the 
mutant 2C19 cDNA sequence shown in Fig. 12- Such a reverse 
primer can be paired with any suitable forward primer 
sufficiently complementary with an upstream subsequence of the 
antisense strand of the 2C19 cDNA to hybridize therewith. 

Primers for selective amplification of the mutant 
allele can also be designed. A suitable primer hybridizes to 
two 2C19 subsequences, of about 1-50, 5-30 or 10-20 
nucleotides, which subsequences are separated by nucleotides 
643-682 in the wildtype sequence, but which are contiguous in 
the mutant sequence. Such primers hybridize to mutant 2C19 
cDNA sequences without hybridizing to wildtype sequences. For 
example, a forward primer comprising a subsequence of 
nucleotides 633-642 of the wildtype 2C19 cDNA sequence shown 
in Fig. 12 joined to a second subsequence of nucleotides 684- 
693 of this sequence is suitable. This primer can be paired 
with any suitable reverse primer sufficiently complementary to 
a downstream subsequence of the sense strand of the 2C19 cDNA 
to hybridize therewith. 

For analysis of the 636 polymorphism, primers can 
designed using the same strategy as discussed for selective 
amplification of genomic DNA except that the primers, which 
include nucleotide 636, are formed from nucleotide segments 
from cDNA rather than genomic sequences. 

Amplification products are analyzed using the same 
methods as described for corresponding genomic amplification 
products . 

p. Diagnostic Kits 

The invention also provides kits comprising useful 
components for practicing the diagnostic methods of the 
invention. The kits comprise at least one of the primers 
discussed above. Kits usually contain a matched pair of 
forward and reverse primers as described above for amplifying 
a segment encompassing the 681 and/or the 636 polymorphism. 
Some kits contain two matched pairs of primers, e.g., one pair 
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for analysis of the 681 polymorphism, the other pair for 
analysis of the 636 polymorphism. For selective amplification 
of mutant or wildtype alleles, kits usually contain a pair of 
primers for amplif ication of the mutant allele and/or a 
5 separate pair of primers for amplification of the wildtype 
allele. Optional additional components of the kit include, 
for example, restriction enzymes for analysis of amplification 
products, such as BamHI, Smal, Hinfl and/or EcoRII (or 
isoschizomers of any of these) , reverse-transcriptase or 

10 polymerase, the substrate nucleoside triphosphates, means used 
to label (for example, an avidin-enzyme conjugate and enzyme 
substrate and chromogen if the label is biotin) , and the 
appropriate buffers for reverse transcription, PCR, or 
hybridization reactions. Usually, the kit also contains 

15 instructions for carrying out the methods. 

G, Nucleic AcA4 F^qmfi^s 

In another aspect , the invention provides fragments 
of a mutant 2C19 allele spanning the 681 polymorphism and/or 

20 636 polymorphism. The fragments usually have up to about 50, 
100, 200, 500, 1000, 2000 or 10,000 bp of 2C19 sequence. Some 
fragments comprise at least about ten contiguous nucleotides 
including nucleotide 681 from the mutant 2C19 allele shown in 
Fig. 15. Other fragments comprise at least about ten 

25 contiguous nucleotides including nucleotide 636 from the 

mutant 2C19 allele shown in Fig. 17. The fragments can be 
single or double stranded. The fragments are provided in 
substantially purified form. Usually, the fragments are the 
result of PCR amplification. The fragments are useful in the 

30 diagnostic assays discussed above. 

The following examples are provided to illustrate 
but not to limit the invention. 

35 EXAMPLES 

Materials . Human liver samples were obtained from 
organ donors through the National Disease Research Interchange 
in Philadelphia, PA, and from the Human Liver Research 
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Facility, Stanford Research Institute, Life Sciences Division, 
Menlo Park, CA. Restriction endonucleases were purchased from 
Pharmacia LKB Biotechnology, Inc. (Piscataway, NJ) . [a- 33 ] 
dCTP (3000 Ci/mmol) and [t- 32 P] ATP (500 Ci/mmol> and [a- 32 S] 
dATP (650 Ci/mmol) were from Amersham Corp. (Arlington 
Heights, IL) . All other reagents were of the highest quality 
available. 

conditions . Hybridization and washing conditions 
for screening libraries with random- labeled cDNAs for 2C13(g) 
or 254c used the same solutions as described for actin, but . • 
were performed at nonstringent temperatures (42 °C). 
conditions for hybridization of clones with T300R were 
identical with those described above. Hybridization of cDNA 
clones with M300R (recognizes 2C9, 2C10, and 2C19) (5'- 
ACTTTTCAATGTAAGCAAAT-3 ' ) (SEQ. ID. No. 17) was identical 
except that for each oligomer the hybridization temperature 
and the high-stringency wash were 5°C below the calculated 
melting temperatures. 

Bvample M construction an d Screening of Human Liver cPNA 
Libraries 

Two cDNA libraries were constructed from human 
livers 860624 and S33, which differed phenotypically in the 
hepatic content of P450 HLx (2C8) (SEQ. ID. No. 8). Several 
partial cDNA clones were found but no full-length clones. 

A second cDNA library (from a liver phenotypically 
high in HLx) was then screened. Eighty-three essentially 
full-length (>1.8 kb) clones belonging to the 2C subfamily 
were isolated from this library. These include full-length 
clones for two additional new members of the 2C subfamily. 

The majority of the cDNAs characterized in the high- 
HLx library (60%) were one of two allelic variants of 2C9, 
while 35% represented 2C8 (SEQ. ID. No. 8) . Two new genes 
were identified (two allelic variants of 2C18 and 2C19) . 

The two cDNA libraries from individuals 
phenotypically high and low in HLx were examined to determine 
whether a variant mRNA for 2C8 (SEQ. ID. No. 8) . was 
responsible for the polymorphic expression of HLx and to 
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identify additional members of the 2C subfamily. No clones 
for 2C8 (SEQ. ID. No. 8) were isolated from the individual 
phenotypically high individual. Two allelic variants for 2C9 
were isolated. In addition, full-length cDNAs for two 
5 additional new members (2C18 and 2C19) were isolated. These 
new members of the 2C subfamily were expressed in COS-1 cells 
and shown to be immunochemically distinct from HLx and 2C9, 
and 2C18 metabolized racemic mephenytoin. 

Total human liver RNA was prepared by the guanidine 

10 hydrochloride method (Cox, Methods Enzymol. 12:120-129 (1968)) 
from two human livers either low (860624) or high (S3 3) in HLx 
as identified by immunoblot analysis. Poly (A+) RNA was then 
isolated by two passages over an oligo(dT) -cellulose column 
(Aviv et al. f Proc. Natl. Acad. Sci. U.S.A. 69:1408-1412 

15 (1972)). The low-HLx cDNA library was prepared by Stratagene 
Cloning systems (La Jo 11a, CA) , and the double- stranded cDNA 
was treated with SI nuclease. Following the addition of EcoRI 
linkers, the double-stranded cDNA was size-fractionated on a 
CL-4B Sepharose column. The largest fraction was ligated into 

20 XZAPII and then transfected into XLl-Blue. The high-HLx cDNA 
library was constructed following the methods of Watson et 
al., in DNA Cloning (Glover, D.M. , Ed.) 1:79-88, IRL Press, 
Washington, D.C. (1985)). Double-stranded cDNA was ligated to 
EcoRI linkers, size-fractionated on an agarose gel (1.8-2.4 

25 kb) , and then ligated into XZAPII (Stratagene) and transfected 
into XLl-Blue. 

The low-HLx library was screened under conditions of 
low stringency with a 32 P-labeled rat P450 2C13 cDNA probe and 
with oligonucleotides for human 2C8 (SEQ. ID. No. 8) (T300R) 

30 (5 1 -TTAGTAATTCTTTGAGATAT-3 1 ) (SEQ. ID. No. 18) and 2C9 (M300R) 
(5 1 -CTGTTAGCTCTTTCAGCCAG-3 1 ) (SEQ. ID. No. 19). The high-HLx 
library was screened under conditions of low stringency using 
a 32 P-labeled 254C cDNA probe derived from the first library 
and M300R (2C9) . Positive clones were isolated, transfected 

35 into XLl-Blue, and excised into the plasmid Bluescript, 
according to Stratagene 's excision protocol. 

Screening the cDNA library constructed from a low- 
HLx individual with a cDNA for rat 2C13 under nonstringent 
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conditions and with oligonucleotide probes specific for 2C8 
(SEQ. ID. No. 8) and 2C9 yielded several clones for 2C9 and a 
partial DNA, clone 254c, which now appears to be an 
incompletely characterized splice variant of the P450 2C 
subfamily. None of the clones identified in this library were 
full-length. Clone 186 was identical with but 25 base pairs 
longer than MP-4, a 2C9 clone previously described by Ged et 
al. (1988) . 

Approximately 40000 plaques were then screened from 
the library from liver S33 with the cDNA for 254c under non- 
stringent conditions and with an oligonucleotide probe 
specific for 2C9. Eighty-three essentially full-length 2C 
clones (>1.8 kb) were isolated, purified, and partially or 
completely sequenced (Table I) . Of these, 29 clones were 
found to encode cytochrome P450 2C8 (SEQ. ID. No. 8) . One 
clone (7b) of 2C8 (SEQ. ID. No. 8) was isolated which was 
similar to Hpl-1 and Hpl-2 reported by Okino et al.(1987), but 
different by having a tyrosine at position 130 instead of an 
asparagine and an isoleucine at 264 instead of a methionine. 
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TABLE I 

Distribution of P450 2C cDNA Clones from 
Human Liver S3 3* 

No, of Clones % Distribution 

5 

2C8 (SEQ. ID. No. 8) 29 35 

2C9 

65 (SEQ. ID. No. 10) 39 47 

25 (SEQ. ID. No. 4) 11 13 

10 2C10 0 0 

2C18 

29c (SEQ. ID. No. 6) 1 1.2 

6b (SEQ ID. No. 12) 2 2.5 

2C19 (11A) (SEQ ID No. 2) 1 1.2 

15 Total 83 100 



* Clones were classified by hybridization with specific 
oligonucleotide probes and partial sequencing. 

20 There are a number of polymorphisms in the human 

CYP2C subfamily. These include variations in the hepatic 
levels of HLx (Wrighton et al., Arch. Biochem. Biophys. 
306:240-245 (1987)) and metabolic variations in the hepatic 
metabolism of S-mephenytoin. The molecular basis for these 

25 polymorphisms has not been characterized. 2C8 (SEQ. ID. No. 
8) appears to encode the protein for HLx on the basis of its 
N- terminal amino acid sequence (Okino et al., J". Biol. Chem. 
262:16072-16079 (1987); Wrighton et al., supraj Lasker et al., 
Biochem. Biophys. Res. Commun. 148:232-238 (1987)). 

30 

Example 2: Sequence Analysis 

The Bluescript plasmids containing the positive cDNA 
inserts from the low-HLx library were purified by CsCl 
gradients, while the plasmids containing cDNA inserts from the 
35 high-HLx library were purified by using Qiagen plasmid 
purification kits (Qiagen, Inc., Studio city, CA) . The 
double-stranded cDNA inserts were sequenced by the dideoxy 
chain termination method reported in Sanger et al., J. Mol. 



WO 95/30766 



PCT/US95/05744 



58 

Biol- 162:729-773 (1982), using Sequenase kits (U.S. 
Biochemical Corp. , Cleveland, OH) . The full-length clones 65 
(SEQ. ID. NO. 10), 25 (SEQ. ID. No. 4), 7b, 11a (SEQ. ID. 
No. 2), 29c (SEQ. ID. No. 6) and 6b (SEQ. ID. No. 12) were 
5 sequenced completely in both directions with primers spaced 
approximately 20 bases apart. The remaining positive clones 
from the high-HLx cDNA library were sequenced in both 
directions through both the 5 1 and 3 • ends and through all the 
regions which would identify any of the known allelic 
10 variants . 

The majority of the clones (50) isolated from the 
library from liver S33 coded for 2C9. Interestingly, all of 
the 50 clones appeared to be 1 of 2 2C9 allelic variants, 
typified by the full-length clones 65 (SEQ. ID. No. 10) and 25 

15 (SEQ. ID. No. 4). All of these clones were sequenced through 
the 5 1 and 3 1 ends and through regions which would identify 
known allelic variants. Thirty-nine of the 2C9 clones were 
identical with clone 65 (SEQ. ID. No. 10), and 11 were 
identical with clone 25 (SEQ. ID. No. 10). 

20 The nucleotide sequence for clone 65 (SEQ. ID. No. 

10) and clone 25 (SEQ. ID. No. 4) is shown in Figure 2. 
Clones 25 (SEQ. ID. No. 4) and 65 (SEQ. ID. No. 10) were 
identical in the 5 '-and 3 ! -noncoding regions but contained 
two single-base changes at positions 1075 and 1425. One of 

25 these base changes was conservative, but the second would 
result in one amino acid difference at position 359 
(isoleucine versus leucine) . clone 65 (SEQ. ID. N. 9) is 
identical in amino acid sequence with human form 2, although 
it differs by two silent changes in the coding region and four 

30 differences in the noncoding region (Yasumori et al., 1987). 
clone 65 (SEQ. ID. No. 9) contained a leucine instead of a 
isoleucine at position 4, a valine instead of a serine at 
position 6, and an arginine instead of a cysteine at position 
144 compared to the 2C9 sequenced by Kimura et al. (1987). 

35 The 2C9 reported by Meehan et al. has substitutions at 

positions 144, 175, and 238 compared to the clones obtained in 
this invention (Meehan et al.. Am J Hum Genet., 42:26-37 
(1988)). 
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The remaining clones characterized from the human 
liver S3 3 cDNA library encode several novel P450 2C cDNAs. 
Their DNA sequences are shown in Figure 2 and their percent 
homology with other known 2C members shown in Table II. Two 
5 of these clones, 29c (SEQ. ID. No. 6) and 6b (SEQ. ID. No. 

12), differ by one nucleotide in the coding region (position 
1154), which would result in a single amino acid change 
(threonine vs methionine at position 385) . Clone 29c (SEQ. 
ID. No. 6) had a very long (198 bp) 5*-noncqding sequence and 

10 a polyadenylation signal 21 bases from the poly (A) tail. 

Clone 6b (SEQ. ID. No. 12) had an unusually long 3'-noncoding 
region containing three possible polyadenylation signals with 
no poly (A) tail. The differences in the 3 f -noncoding region 
could represent alternate splicing, allelic variants, or 

15 possibly separate genes. However, these clones are designated 
as allelic variants of (2C18) because they differ by only one 
base in the coding region. They are most similar to 2C9 (82% 
amino acid homology) and 2C19 (SEQ. ID. No. 2) (81% amino acid 
homology) (Table II) . 

20 A third unique P450 2C cDNA, clone 11a (SEQ. ID. 

No. 2) (designated 2C19) , was also identified. 2C19 is 92% 
homologous in its amino acid sequence to 2C9, 81% homologous 
to 2C18, and 79% homologous to 2C8 (SEQ. ID. No. 8). Clone 
11a (SEQ. ID. No. 2) had a short 5" -leader sequence and 

25 contained the stop codon, but did not have a polyadenylation 
signal or poly (A) tail. Interestingly, no clones for 2C10 
(MP-8) were isolated from either library, despite the 
sequencing of the 3 V region of all 50 putative 2C9 clones. 
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TABLE II 

Percent Homology for Nucleotide 
and Amino Acid Sequences of P450 2C cDNAs* 



29c 11a 

Clone 2C8 2C9 (SEQ ID N0.6)(SEQ ID NO. 2) 

fSEO ID NO, 8) (2C18) (2C19) 

29c (2C18) 84 86 100 86 

(SEQ ID NO. 6) 89 93 100 93 

lla (2C19) 83 94 86 100 

SEQ ID NO. 2) 91 96 93 100 



* For each comparison, the upper value represents percent 
nucleotide homology, and the lower value represents 
percent amino acid homology. The nucleic acid 
comparisons include both the coding and 3 1 -non- coding 
regions. The 2C9 sequence used in this comparison was 
the cDNA sequence for clone 65. 

Figure 4 shows the alignment comparisons for the 
deduced amino acid sequences of all known members of the human 
CYP2C family, including the three new P450s of the present 
invention. The 7 proteins, along with the consensus sequence, 
can be aligned with no gaps, and each is predicted to be 490 
amino acids long* The amino acid sequences show marked 
similarities with many regions of absolute conservation. 
Regions of marked conservation are noted form 131 to 180, and 
from 302 to 460. These human P450 2C protein sequences also 
demonstrate hypervariable regions which may be important for 
interactions between the enzyme and substrate. These include 
the region from 181-120 and 220-248 as well as 283-296 and a 
short region near the carboxyl terminus at 461-479. Notably, 
it has been reported that a putative recognition site for 
phosphorylation of P450 by cAMP-dependent kinase for P450 2B1 
(Arg-Arg-Phe-Ser) at positions 124-127 was conserved in 2C8 
(SEQ. ID. No. 8), 2C9, and 11 (2C19) , suggesting that these 
cytochromes might be regulated by phosphorylation (Muller et 
al., FEBS Lett. 187:21-24 (1985). 

However, 2C18 did not contain a serine at this site. 
The overall percent homology for both nucleic acid and protein 
sequences is summarized in Table II. 
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Two additional full-length allelic variants of 2C9 
have been isolated. One of these clones is identical with MP- 
4, but is full-length. It varies from the almost full-length 
human form 2 isolated by Yasumori et al., supra, by only two 
5 silent base changes in the coding region and by four changes 
in the noncoding region. The number of differences in the 
nucleic acid sequences of the presumed allelic variants 
isolated by different laboratories range from 4 to 17 and the 
amino acid changes vary from 0 to 4, as illustrated in 

10 Figure 3. Two of the amino acid differences occur within the 
first six N-terminal residues, the others occurring singly 
throughout the sequence. The effect of these changes on 
catalytic activity has not been systematically studied. In 
Relling et al., J. Pharmacol. Exp. Ther. 252:442-447 (1990), 

15 it was reported that when the cDNAs for 2C8 (SEQ. ID. No. 8) 
and 2C9 4 -hydroxy lated racemic mephenytoin but did not 
metabolize (S) -mephenytoin. However, the form of isolated 2C9 
(human form 2) which is described in Yasumori et al. (1990) , 
metabolized (S) -mephenytoin preferentially when expressed in 

20 yeast. These forms differed by only three amino acids. In 

contrast, Brian et al., Biochemistry 28:4993-4999 (1989) found 
that when a full-length MP-8 (constructed with the first 15 
nucleotides predicted from the known amino acid sequence of 
P450 mp _ 1 ) was expressed in yeast, it did not metabolize (S)- 

25 mephenytoin. This form would differ from human form 2 by only 
two amino acids. Thus, the role of 2C9 in (S) -mephenytoin 
metabolism remains controversial. 

Example 3: Human RNA Blot Analysis and Hybridization 
30 Conditions 

Poly(A+) RNA (10/ig) was electrophoresed in a 1% 
agarose gel under denaturing conditions and transferred to a 
Nytran filter (Micron Separation, Inc., Westboro, MA), and 
filters were then baked for 2 h at 80°C. The filters were 
35 prehybridized for 2 h, then hybridized overnight with a 32 P- 
labeled specific oligonucleotide probe for 2C8 (SEQ. ID. 
No. 8) (T300R) at 42 °C, washed 3x5 min at room temperature 
and 1x5 min at 42° C with 2 x SSC/0.1% SDS, and 
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radioautographed. Filters were then stripped with 5 mM Tris 
(pH 8.0), 0.2 mM EDTA, 0.05% sodium pyrophosphate, and 0.1 x 
Denhardt's for 2 h at 65° C and rehybridized with a random- 
primed actin cDNA (Oncor, Gaithersburg, MD) at 50° C using 6 x 
SSC, 4 x Denhardts, and 0.5% SDS. These filters were washed 1 
x 5 min at room temperature, 1 x 10 min at 48° C, and 4 x 15 
min at 48° C and radioautographed as before. The 2C8 mRNA 
band was quantitated by scanning with an LKB Ultrascan laser 
densitometer, and the values of the integrated peaks were 
divided by those of the actin peaks. 

Hybridization with T300R was negligible in mRNA from 
860624 compared to S33 and a number of other liver samples 
(Figure 5) . When corrected for hybridization with the actin 
probe, the amounts of 2C8 (SEQ. ID. No. 8) mRNA were 
consistent with the relative amounts of HLx observed in 
Western blot analysis. Laser scans of the autoradiographs 
indicated that 2C8 (SEQ. ID. No. 8) mRNA levels in sample 
860624 were at least 70-fold lower than in S33 and 3 to 15- 
fold lower than in any of the remaining samples. 

Example 4: Cell Express ion Studies 

cDNA inserts were ligated into the cloning region of 
the expression plasmids pSVL (Pharmacia LKB biotechnology, 
Inc., Piscataway, NJ) or pcD (Okayama et al. , Mol. Cell. Biol. 
3:280-289 (1983)) and used to transform COS-l cells. COS-l 
cells were placed at (1-2) x 10 6 cells per 1-cm dish and grown 
for 24 h in Dulbecco • s -modified Eagle's medium with 10% fetal 
bovine serum (DMEM) . The cells were then washed with 
Dulbecco' s phosphate-buffered saline (PBS) and transfected 
with recombinant plasmid (3 jig P®* disn ) in DEAE-dextran (500 
^g/mL) for 30 min-1 h at 37° C. The transfected cells were 
then treated with chloroquine (52 fig/iaL) in DMEM for 5 h 
(Luthman et al. , Nucleic Acids Res. 11:1295-1308 (1983)), 
washed with PBS, refed with DMEM, and incubated for 72 h prior 
to harvest. Typically, 15-20 dishes were transfected with 
each recombinant plasmid. For Western blot analysis of the 
recombinant transformed COS-l cells, cells were scraped from 
the dishes into buffer (50 mM Tris-HCl, pH 7.5, 150mM KC1, and 
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lmM EDTA) and lysed with 3 x 5 s bursts with a polytron. A 
portion of each lysate was centrifuged at 9000g and then 
lOOOOg for the preparation of a microsomal fraction. Western 
blots were then performed as described above. Total RNA was 
isolated from transfected COS-1 cells, and Northern blots were 
performed as described for human samples. The filters were 
hybridized with a 32 P-labeled oligonucleotide probe which 
hybridizes with all 2C clones isolated (2C500R) (5»- 
GGAGCACAGCCCAGGATGAA-3 " ) (SEQ. ID. No. 20) at 55 °C, and 
radioautographed . 

The two variant cDNAs for 2C9, the two variant cDNAs 
for 2C18 f and the cDNA for 2C19 were inserted into expression 
vectors and transfected into COS-1 cells. Cell lysates were 
prepared and immunoblotted by using antibody to HLx and P450 
2C9. The results are shown in Figure 4. Transfection of COS- 
1 cells with the two variants of 2C9 (25 (SEQ. ID. No. 4) and 
65 (SEQ. ID. No. 10)) resulted in the expression of a protein 
(SEQ. ID. No. 3) with a molecular weight equal to that of pure 
2C9. In contrast, neither 2C18 (either variant) nor 2C19 was 
detected by antibody to HLx or 2C9. However, Northern blot 
analysis indicated that all three cDNAs had been successfully 
transfected into these cells. The sizes of the transcripts 
were those expected for the constructs. The somewhat lesser 
hybridization of the 2C oligoprobe with RNA from cells 
transfected with 11a (SEQ. ID. No. 2) reflects a lower amount 
of RNA in this sample as shown by the hybridization with the 
act in probe. 

Example 5: Expression of Cytochro me P450 2C19 and 2C18 
Polypeptides in a Stable Cell Line 
l . Materials 

fa) Liver Samples and Chemicals 
Human liver samples were obtained from Dr. Fred 
Guengerich, University of Vanderbilt, Nashville, TN. 
Restriction endonucleases were purchased from Stratagene 
Cloning Systems (La Jolla, CA) . [0f- 32 P]dCTP (3000 Ci/mmol) , 
[t 32 P]ATP (5000 Ci/mmol) and [a- 35 S]dATP (650 Ci/mmol) were 
from Amersham Corp. (Arlington Heights, IL) . Nirvanol was 
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obtained from Adrian KUpfer, University of Berne, Switzerland 
and separated into its R- and S- enantiomers as described by 
Sobotka et al., J. Amer. Chem. Soc. 54:4697-4702 (1932). 
Radiolabeled S- and R-mephenytoin (N-methyl- 14 C) were 
5 synthesized by E.I. DuPont de Nemours & Co., Inc. (Wilmington, 
DE) by methylation of R- and S-nirvanol. The radiochemical 
purity of both isomers was greater than 90% as assessed by 
HPLC. A single impurity which accounted for less than 2% of 
the parent compound was not characterized, since it eluted 

10 after the metabolites and parent compound. Moreover, the 
percentage of the impurity remained the same (less than 2%) 
before and after incubations. All sequencing was done by the 
dideoxymethod using Sequenase Kits (U.S. Biochemical Corp., 
Cleveland, OH) . The specific activities of the S- and R- 

15 enantiomers were 20.7 and 20.9 mCi/mmol respectively. All 
other reagents used are listed below or were of the highest 
quality available. 

(h) Additional Sequences of 2C cDNAs Used in the 

20 Expression Studies 

Two full-length clones of 2C8 (7b and 7c) described 
in Romkes et al., Biochemistry 30:3247-3255 (1991), were 
sequenced through the coding region in the present study. The 
sequences were similar to that of the 2C8(HP1-1) reported by 

25 Okino et al., supra; however, both clones had coding changes 
at position 390 (A-*C) (Asn 130 -»Thr) and G-»C at position 792 
(Met 264 -»Ile) and a change in the noncoding region at 
1497(T-»C). These changes presumably represent a second 
allelic variant of 2C8. The Thr 130 and lie 264 amino acids 

30 found in our 2C8 clones are conserved in the remainder of the 
human P450 2C subfamily (2C9, 2C18, and 2C19) and are 
therefore consistent with the amino acid substitutions in 
other members of this subfamily. 



35 (c\ Yeast Strains and Media 

Saccharomyces cerevisiae 334 (MAT a, pep 403, prbl 
1122, ura 3-52, leu 2-3, 112, regl-501,gall) , a protease 
deficient strain kindly provided by Dr. Ed Perkins (NIEHS) , 
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was used as the recipient strain in these studies and 
propagated non-selectively in YPD medium (1% yeast extract, 2% 
peptone, 2% dextrose) (Hovland et al., Gene 83:57-64 (1989)). 
For the selection of Leu + transformants, the cells were grown 
5 in synthetic complete medium minus leucine (Rose et al., 
Methods in Yeast Genetics (Rose et al., eds.) pp. 180-187, 
C.S.H.P., NY 1990). Plates were made by the addition of 2% 
agar. 

10 2 . Methods 

fa) Amplification of 2C18 and 2C9 RNA for Direct 

Sequencing 

Total RNA from selected human liver samples was 
isolated by the single-step method (Chomozynski et al., Anal. 

15 Biochem. 163:156-159 (1987), using TRIREAGENT m (Mol. Res. 

Center, Inc., OH). RNA (10 jig) was reverse transcribed using 
2.6 fiM random hexamers as the 3 1 -primer by incubating for 
1 hour at 42 °C using 2.5 U/^l of M-MLV reverse transcriptase 
(BRL, Grand Island, NY) in 10 mM Tris-HCl, pH 8.3, 5 mM KC1, 

20 5mM MgCl 2 , l U//il RNase inhibitor (Promega, Madison, WI) and 
1 mM each of dATP, dCTP, dGTP, and dTTP (Perkin Elmer Cetus, 
Norwalk, CT) . The samples were then heated for 5 minutes at 
99 °C to terminate the reverse transcription. 

The cDNA was then amplified for a region containing 

25 the allelic differences in 2C18 and 2C9 using a nested PCR 
method. The DNA was amplified in IX PCR buffer (50 mM KC1, 
10 mM Tris-HCl, pH 8.3) containing 1 mM MgCl 2 , 0.2 mM each of 
dATP, dCTP, dGTP, dTTP and 20 pmol of each of the 5 1 and 3' 
primers in a final reaction volume of 100 fil. The reaction 

30 mixture was heated at 94 °C for 5 minutes before addition of 
2.5 U of AmpliTaq DNA polymerase (Perkin Elmer Cetus). For 
PCR of 2C18, the 3' -primer was 5 1 -TGGCCCTGATAAGGGAGAAT-3 1 
(SEQ. ID. No. 23) and the 5' -primers were 

5 1 -ATCCAGAGATACATTGACCTC-3 1 (SEQ. ID. No. 24) (outer) and 
35 5 ■ -CCATGAAGTGACCTGTGATG-3 1 (SEQ. ID. No. 25) (inner). For 
2C9, the 3' -primer was 5 ■ -AAAGATGGATAATGCCCCAG-3 • (SEQ. ID. 
No. 26) and the 5' -primers were 5 1 -GAAGGAGATCCGGCGTTTCT- 
3* (SEQ. ID. No. 27) (outer) and 5 ■ -GGCGTTTCTCCCTCATGACG- 
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3* (SEQ. ID. No. 28) (inner). The outer amplification was 
performed for 20 cycles consisting of denaturation at 94 °C for 
l minute, annealing at the appropriate temperature for 
30 seconds, and extension at 72 °C for 1 min. After a 50-fold 
dilution, PCR was carried out similarly with the inner primers 
for 35 additional cycles. 

The PCR products were purified using a Centricon-30, 
dried, suspended in 40 fil of sterile water, and sequenced 
using Seguenase Kits and a P 33 -end labeled sequencing primer. 
For 2C18, the primer used was 2C18.1184R 5 • -TTGTCATTGTGCAG-3 ' 
(SEQ. ID. No. 29). Sequencing primers for 2C9 were 2C9.1030F 
S'-CACATGCCCTACACA-S' (SEQ. ID. No. 30), 2C9.385F 
5 ' -TGACGCTGCGGAATT-3 1 (SEQ. ID. No. 31), and 2C9.783F 
5 ' -GGACTTTATTGATTG-3 (SEQ. ID. No. 32). 

Full length 2C9 cDNA was also amplified by PCR from 
a human liver with high S-mephenytoin 4 ' -hydroxylase activity 
using the primers 5 ' -ATGATTCTCTTGTGGTCCT-3 • (SEQ. ID. No. 33) 
and 5 ' -AAAGATGGATAATGCCCCCAG-3 ' (SEQ. ID. No. 34) . The PCR 
reaction was similar to above, except that the primer 
concentrations were increased 10-fold (0.25 fM) . The PCR 
products were then cloned into the pCRlOOO vector using the TA 
Cloning System (In Vitrogen, San Diego, CA) and sequenced to 
identify the allelic variant present. 

fb) Plasmid Construction and Me thods for Amplifying 
mil -length 2C18 a nd 2C19 cPNAs bv PCR 

The strategy for cloning the P450 2C cDNAs into the 
yeast vector pAAH5 is described below. The 5'-noncoding 
sequence of the P450 2C cDNAs was eliminated by PCR 
amplification to optimize expression in yeast cells. The 5'- 
primer introduced a Hind III cloning site and a six A-residue 
consensus sequence upstream of the ATG codon to promote 
efficient translation in yeast (Hamilton et al., Nucl. Acids 
Res. 15:3581-3593 (1987), Cullin et al. , Gene 65:203-217 
(1988)). The 3'- primer was positioned between the stop codon 
and polyadenylation site and introduced a second Hind III 
site. cDNA inserts in the pBluescript vector (O.l./tg) (Romkes 
et al. , (1991) , supra) were amplified by PCR as described 
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before except that the reaction contained 3.5 mM MgCl 2 , 
0.25 iM each of the 5 1 - and 3 1 - primers, and 1 (il PerfectMatch 
(Stratagene, La Jolla, CA) . Amplification was performed in 
sequential cycles, with the first cycle including denaturation 
5 for 1 min. at 94 °C, annealing at the appropriate temperature 
for 1 min., and polymerization at 72 °C for 3 min. The 
remaining 24 cycles consisted of a denaturation step at 94 °C 
for 1 min. and a combined annealing/extension step at 72 °C for 
3 min. After the last cycle, all samples were incubated an 

10 additional 10 min. at 72 °C. The primers used were: 

2C8: 5 1 -GCAAGCTTAAAAAAATGGAACCTTTTGTGGTCCT-3 • (SEQ. ID. 
No. 35) and 5 1 -GCAAGCTTGCCAGATGGGCTAGCATTCT-3 1 (SEQ. ID. 
No. 36); 2C9: 5 1 — GCAAGCTTAAAAAAATGGATTCTCTTGTGGTCCT-3 • (SEQ. 
ID. No. 37) and 5 1 -GCAAGCTTGCCAGGCCATCTGCTCTTCT-3 1 (SEQ. ID. 

15 No. 38); 2C19: 5 1 -GCAAGCTTAAAAAAATGGATTCTCTTGTGGTCCT-3 1 (SEQ. 
ID. No. 39) and 5 1 -GCAAGGTTGCCAGACCATCTGTGCTTCT-3 1 (SEQ. ID. 
No. 40). 

The PCR products were cloned into the pGRlOOO vector 
(InVitrogen, San Diego, OA) • Recombinant plasmids were 

20 isolated from E. coli (INVcff") cells using Qiagen plasmid 
purification kits, and the PCR products were completely 
sequenced as described above to verify the fidelity of the PCR 
reaction. A mutation of ASP 2 -Wal was initially introduced 
inadvertently in 29c via the primers utilized due to an error 

25 in the original sequencing at this position. Therefore, the 
correct 2C18-Asp 2 cDNAs were cloned into the pAAH5 vector by 
an alternate strategy. The 3 ( -end was cut with Ndel, blunted, 
and ligated to a Smal/Hindlll adapter. The clone was then 
partially digested with BamHI which cuts after the initiation 

30 ATG as well as internally, and the intact 1700 fragment get 

purified. A BamHI /Hindlll linker was prepared from the oligos 
5 • -AGCTTAAAAAAATG-3 • (SEQ. ID. No. 41) (upper) and 
5 1 -GATCCATTTTTTTA-3 • (SEQ. ID. No. 42) (lower), annealed, and 
ligated to the cDNA fragment to introduce a Hindlll cloning 

35 site and regenerate the ATG codon. 

The PCR amplified cDNAs were isolated by Hind III 
digestion, ligated into the pAAH5 yeast expression vector, and 
the proper orientation confirmed by restriction analysis and 
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sequencing. The expression vector pAAHS, which contains the 
yeast ADHl promoter and terminator regions and the Leu2 
selectable marker, was kindly provided by Dr. M. Negishi 
(NIEHS) . The recombinant plasmids were isolated from E. coli 
5 DhSa cells using Qiagen plasmid purifications kits and 

transformed into yeast as described previously (Faletto et 
al., J. Biol. Chem. 267:2032-2037 (1992), using the lithium 
acetate method of Ito et al., J. Bacterid. 153:163-168 
(1983) . 

10 

(c) Immunoblots and Cytochrome P450 Determinations 
Yeast microsomes or whole cell lysates were prepared 
from transformed cells isolated at mid-logarithmic phase as 
described previously (Oeda et al-, supra) with slight 

15 modifications (Faletto et al., supra) and stored at -80 °C in 
0.1 M phosphate (pH 7.4) containing 20% glycerol and 0.1 mM 
EDTA. Protein concentrations were determined by the method of 
Bradford et al . , Anal. Biochem. 72:248-254 (1976). SDS- 
polyacrylamide gel electrophoresis and Western blots were 

20 performed on yeast microsomes or whole cell lysates (Faletto 
et al., supra) and immunoblots probed with antibody to the 
appropriate P450 as described (Yeowell et al.. Arch. Biochem. 
Biophys. 243:408-419 (1985). Cytochromes P450 2C8, P450 2C9 
and NADPH:P450 reductase were purified from human liver 

25 microsomes (Raucy et al., Methods in Enzymol. 208:577-587 

(1991) and antibodies to 2C8 and 2C9 prepared in rabbits as 
previously described (Leo et al. , Arch. Biochem. Biohys. 
269:305-312 (1988)). Specific peptides NH 2 -CIDYLPGSHNKIAENFA- 
COOH (SEQ. ID. No. 43) (amino acids 231-249) for P450 2C18 and 

3 0 NH 2 -CLAFMESDILEKVK-COOH (SEQ. ID. No. 44) (amino acids 236- 
249) for 2C19 were selected from amino regions where these 
P450s vary from other known 2C subfamily members (Romkes et 
al. , (1991), supra). These peptides were synthesized, 
conjugated to bovine serum albumin via m-maleimidobenzoyl-N- 

35 hydroxysuccinimide ester, and antibodies to the conjugates 

raised in rabbits by BIOSYNTHESIS INC. (Denton, TX) . E. coli 
lysate (4 mg/ml) was added to the primary peptide antibody in 
first step of the immunoblot procedure to block non-specific 
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reactions of these rabbit antibodies to yeast cell wall 
proteins. Cytochrome P450 concentrations of microsomes were 
determined by dithionite-reduced carbon monoxide difference 
spectra by the method of Omura et al., J. Biol. Chem. 
5 239:2370-2378 (1964) using an extinction coefficient of 91 mM- 
lcm" 1 . 

Microsomes of human livers were prepared as 
described by Raucy et al., supra. SDS-polyacrylamide gel 
electrophoresis and immunoblot analysis was performed as above 
10 except that immunoblot s were developed using the ECL (enhanced 
chemiluminescence) Western blotting kit from Amersham (UK) . 
Immunoblots were scanned with a laser densitometer (LKB 
Instruments) . 

(cn Purification of Cytochromes from Recombinant 
Yeast Microsomes 

Recombinant yeast microsomes were prepared from a 
10-12 1 culture, and recombinant P450s were purified by 
aminooctylsepharose chromatography as described by Iwasaki et 
al., J. Biol. Chem. 226:3380-3382 (1991). The Emulgen was 
then removed from protein by adsorption of the protein to a 4g 
hydroxy lapatite column (Hypatite C, Clarkson Chemical Company, 
Williamsport, PA) equilibrated with 10 mM potassium phosphate 
buffer (pH 7.2), 20% glycerol, 0.1 mM EDTA, and 0.1 mM DTT and 
washing the column with the same buffer until the absorbance 
at 280 nm returned to zero. The P450 was then eluted with 
4090 mM DTT, and dialyzed overnight against 100 mM potassium 
phosphate buffer (pH 7.4, 20% glycerol and 0.1 mM EDTA. 
Absolute and CO difference spectra of purified P450s were 
determined in the same buffer but containing 0.2% Emulgen and 
0.5% cholate. 



15 



20 



25 



30 



fel Tolbutamide Hydroxylase Assays 
Tolbutamide hydroxylase activity was measured 
according to Knodell et al., J. Pharmacol. Exper. Ther. 
241:1112-1119 (1987), with several modifications. Yeast 
microsomes (1 mg protein) were preincubated with 300 pmol 
hamster P450 reductase in 0.2 ml of the incubation buffer 
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(below) for 3 min at 37 °C. The reaction was then placed on 
ice and incubated in 0.2 ml of 50 mM HEPES buffer (pH 7.4) 
containing 1.5 mM MgCl 2 , 0.1 mM EDTA in a final volume of 1 ml 
and 1 mM sodium tolbutamide. The reaction was initiated with 
5 0.5 mM NADPH. Human liver microsomes (0.22 mg protein) were 
incubated without reductase. Incubations with reconstituted 
recombinant P450s contained 50 pmol purified P450 enzyme, 
150 pmol P450 reductase, and 15 fig dilauroylphosphatidyl- 
choline, and were performed in 100 mM potassium phosphate 

10 buffer (pH 7.4). Reactions were terminated after 60 min at 
37 °C by the addition of 50 jil of 4N HCl f followed by 
extraction with 3 ml of water-saturated ethyl acetate. The 
ethyl acetate extracts were dried under nitrogen at 40 °C, the 
residue resolubilized in 200 /xl methanol, and 

15 4 -hydroxy tolbutamide then assayed using HPLC by injecting 

50 /il of the solubilized extract onto a /iBONDAPAK C 18 column 
(4.6x300 mm) using 0.05% phosphoric acid, pH 2.6: acetonitrile 
(6:4, v/v) as the mobile phase with a flow rate of 1 ml /min. 
The column eluate was monitored at 230 nm and rates of product 

20 formation were determined from standard curves prepared by 

adding varying amounts of 4-hydroxytolbutamide to incubations 
conducted without NADPH. Preliminary experiments confirmed 
that 4-hydroxytolbutamide formation by human liver microsomes 
(30-120 pmol P450) was linear for up to 90 min. Samples were 

25 analyzed in triplicate. 

( f ) Mephenvtoin 4 1 -Hydroxylase Assay 
Mephenytoin 4 1 -hydroxylase activity was measured by 
a modification of the radiometric HPLC assay described by 
30 Shimada et al., J. Biol. Chem. 261:909-921 (1986), as 

described below. Purified or recombinant yeast microsomes 
(10-50 pmol) were preincubated with 

dilauroylphosphatidylcholine (15 fig per 50 pmol P450) , P450 
reductase (500 U per 50 mol P450) , and human cytochrome b 5 
35 (2:1 molar ratio when added). The reconstituted mixture was 
preincubated for 5 min at 37 °C, and then placed on ice. A 
final concentration of 0,4 mM radiolabeled S- or R- 
mephenytoin (20.7 mCi/mM and 20.9 mCi/mMol) was added to 50 mM 
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HEPES buffer (pH 7,4) containing 0.1 mM EDTA and 1.5 mH MgCl 2 
for recombinant 2C proteins. The mixture was then incubated 
at 37° with shaking for 3 min, and the reaction started with 
the addition of 2mM NADPH and terminated after 30 min with an 
5 equal volume of methanol. Cytochrome b 5 was not included in 
all CYP2C18 reactions, since it had no effect or produced a 
slight inhibition on the activity of this CYP protein. 
Reaction volumes were generally 0.25 ml except when the volume 
of recombinant purified cytochrome or yeast microsomes was 

10 greater than 50 ^1. In these cases, the volume was increased 
to 0.5 ml to limit the volume of glycerol from the purified 
preparation to <4% of the final volume. Incubations with 
human microsomes did not contain exogenous P450 reductase or 
cytochrome b 5 , and they were carried out in 0.1 M phosphate 

15 buffer (pH 7.4) instead of HEPES buffer. Initial experiments 
shows that S-mephenytoin hydroxylase activity of human liver 
microsomes was linear for at least 60 minutes and from 0.05 
through 0.2 mg microsomal protein, and that of the R- 
enantiomer was linear through 1 mg microsomal protein. 

20 At the end of the incubation period, the reactions 

were terminated with an equal volume of methanol. The 
incubation mixture was centrifuged at 10 r 000g for 10 min and 
an aliquot assayed directly using HPLC without extraction. 
Samples with particularly low activity were concentrated by 

25 lyophilization and redissolved in a small volume of 

methanol: water (1:1) before assay* The HPLC system consisted 
of a reverse phase C18 (10/xm) Versapak, 300 mm x 4.1 mm column 
(Altech Associates, Deer field, IL) using an isocratic solvent 
consisting of methanol: water (45:55) with a flow rate was kept 

30 of 1 ml/min for 25 min. Detection of radioactive peaks was 

accomplished using an on-line Flow-One radiochemical detector 
(Radiomatic Instruments Co., Tampa, FL. Detection of the 
unlabeled 4 1 -hydroxymephenytoin authentic standard was 
performed using an on-line mult iwave length UV detector at both 

35 211 and 230 nm. 
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f g) Statistical analyses 

Tolbutamide hydroxylase and mephenytoin hydroxylase 
activities of microsomes prepared from different recombinant 
yeasts were compared by analysis of variance and by Fisher's 
least significant difference test (Carmer et al.. Am. Stat. 
ASS. 68:66-74 (1973)) . 

3 . Results 

(a) Expression of P450 2C cDNAs in veast 
Western blot analysis confirmed the expression of 
the recombinant human CYP2C proteins in the recombinant yeast 
(Fig. 6) . Antibodies to 2C8 and 2C9 recognized polypeptide 
bands of approximately 50,000 daltons (2C8) and 55,000 daltons 
(2C9) which corresponded in mobility to those of the 
recombinant proteins purified from yeast microsomes. These 
mobilities corresponded to those of the corresponding 2C8 and 
2C9 proteins purified from human liver. 2C19 was recognized 
by antibodies to both the 2C9 and the 2C19 peptides. This 
protein corresponded in mobility (<50,000 daltons) to the 
lowest of three bands in Western blots of human liver 
microsomes probed with antibody to human 2C9. The mobility of 
2 CI 8 was intermediate between that of 2C8 and 2C19. 
Antibodies to 2C18 and 2C19 peptides were specific for their 
antigen; however, antibody to 2C9 cross-reacted strongly with 
2C19 and weakly with 2C8 and 2C18. 

CO difference spectral analysis indicated that the 
recombinant P450 2C proteins were expressed at levels as high 
as 160-250 pmol/mg protein in some yeast microsomal 
preparations- 2C18, 65 (2C9) , and 25 (2C9) were expressed at 
levels of 20 to 60 pmol/mg microsomal protein. Initially, 11a 
(2C19) was expressed extremely poorly, and the CO difference 
spectrum of the recombinant 2C19 yeast was indistinguishable 
from that of control yeast (<7 pmol/mg protein) . However, 
after repeated transf ections and selection, expression of 2C19 
at 17 pmol/mg protein was achieved. All of the CYP2C 
proteins were low spin hemoproteins. CYP2C18 appeared to be 
somewhat unstable in yeast microsomes with a large proportion 
(-1/3 to 1/2) of the P450 being converted to P420 in the 
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presence of dithionite and carbon monoxide. None of the other 
recombinant CYP2C proteins showed this lack of stability. 

Ihl Optimization of Tolbutamide and S-Meohenvtoin 

5 Hydroxylase Assays 

Preliminary studies indicated that exogenous P450 
reductase (500 U/50 pmol P450) stimulated metabolism of 
tolbutamide by recombinant 2C9 in yeast microsomes > 10-fold 
and stimulated S-mephenytoin hydroxylase activity 

10 approximately 2-fold. Activity of the recombinant 2C proteins 
was linear with amount of P450 for 30 minutes through at least 
20 pmol P450 for 2C19 (Fig. 7) and 50 pmol for the other CYP2C 
forms. Cytochrome b 5 stimulated S-mephenytoin hydroxylase 
activity of both 2C9 and 2C19 in yeast microsomes and the 

15 optimal ratio of b 5 to P450 was approximately 2:1, but it 
generally had no effect or produced a slight inhibition of 
mephenytoin hydroxylase activity of 2C18 (Fig. 8) . This 
difference is consistent with the fact that all of the CYP2C 
proteins except 2C18 contain a Ser at position 128 which is a 

20 recognition site for cAMP protein kinase 

( 125 Arg-Arg-Phe-Ser 128 ) (Mttller et al. , FEBS Lett. 187:21-24 
(1985) r and this sequence is also thought to be part of a b 5 
binding site (Jansson et al., Arch. Biochem. Biophys. 259:441- 
448 (1987); 2C18 contains Cys at position 125. 

25 Mephenytoin 4 1 -hydroxylase activity of recombinant 

yeast microsomes was consistently higher in HEPES than 
phosphate buffer, while activity of human liver microsomes was 
-2-fold higher in phosphate buffer (pH 7*4). Therefore, 
recombinant proteins were subsequently assayed in HEPES buffer 

30 with exogenous reductase and cytochrome b 5 except for 2C18 

which was tested both with and without cytochrome b 5 . Human 
liver microsomal activities were assayed in phosphate buffer. 
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(c) Menhenvtoin hydroxylase ac tiivltv of recombinant 

human 2C proteins 

S-mephenytoin 4 ■ -hydroxylase activities of yeast 
microsomes containing recombinant human CYP2C proteins were 
compared under optimized conditions described above. HPCL 
profiles of the metabolites of S-mephenytoin produced by human 
liver microsomes and recombinant human CYP2C proteins are 
shown in Fig. 9 and the results summarized in Table III. 
Recombinant 2C19 4 1 -hydroxy lated S-mephenytoin at a rate of 
-5 nmol/min/nmol P450 which was an order of one magnitude 
higher than the rate of 4 1 -hydroxy lation in human liver 
microsomes (Table III and Fig. 9). The retention time (5- 
6 min) of the 4 ' -hydroxymephenytoin metabolite was identical 
to that of the authentic unlabeled standard. 2C19 also 
produced small quantities of two unknown metabolites eluted at 
3-4 and 7-8 min. These unknown metabolites were also produced 
by liver microsomes, and the metabolite with the shorter 
retention time was the principal metabolite produced by 2C8. 
Parent S-mephenytoin eluted at 14-15 min. followed by the 
unknown impurity which eluted at 16-17 min. Similar retention 
times were observed for R-mephenytoin and its metabolites. 

The rate of 4 ■ -hydroxymephenytoin formation by 2C19 
was at least 100-fold higher than that of 2C9 (both alleles) , 
2C18 (both alleles) and 2C8 (Table III). The rate of 4 f - 
hydroxylation of S-mephenytoin by 2C8 appeared to be lower 
than that of 2C9 (0.02 nmol/min/nmol). The 4 1 -hydroxylation 
of mephenytoin by 2C19 was stereospecif ic; the rate of S- 
hydroxy lation was at least 30-fold higher than that of R- 
hydroxy lation (Table III). In contrast, the 4 • -hydroxylation 
of mephenytoin by the other human CYP2C proteins did not 
appear to be stereospecif ic. 
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TABLE III 

S-Mephenytoin 4' -Hydroxylase Activities in 
Recombinant Human CYP2C Yeast Microsomes 

5 

Mephenytoin 4' -Hydroxylase Activity 
nmol/min/nmol P450 

Microsomes S R R/s Ratio 

10 



15 



25 



Controls 




0.028 


± 


0.001 


0.024 


± 


0.003 


0.9 


2C9-Ile 359 


(65) 


0.043 


± 


0.000 


0.041 


± 


0.005 


0.9 


2C9-Leu 359 


(25) 


0.031 


± 


0.009 


0.040 


± 


0.01 


1.3 


2C8 




0.037 


± 


0.001 


0.016 


± 


0.001 


0.4 


2C18-Thr 385 


(29c) + b5 


0.042 


± 


0.004 


0.054 


± 


0.003 a 


1.3 


2C18-Thr 385 


(29c) , no b5 


0.034 


± 


0.008 










2C18-Met 385 


(6b) 


0.023 


± 


0.004 


' 0.019 


± 


0.005 


0.9 


2C19 (lla) 




4.6 


± 


0 _ 3 a,b,d 


0.014 


± 


0.02 a 


0.03 


Human liver 


microsomes HB1 6 


0.283 


± 


0.037 a ' c,< * 


0.117 


± 


0.017 a ' c 


0.4 



30 

S-Mephenytoin hydroxylase assayed as described in Methods. Reaction 
mixtures contained 10 pmol of recombinant CYP2C19 or 50 pmol of other 
recombinant CYP2C yeast microsomes, 500 U of purified P450 reductase and 15 

35 /ig phospholipid per 50 pmol of P450, and 0.4 mM radioactive substrate in 
0.1 M HEPES buffer (pH 7.4) . Unless otherwise stated recombinant yeast 
microsomes were also reconstituted with a 2:1 molar ratio of cytochrome be. 
Reactions were incubated at 37°C for 30 min with 1 mM NAD PH. Control 
reactions contained the same reaction mixture and were incubated similarly 

40 with an equivalent amount of control yeast microsomal protein (l mg) . 

Specific content of P450 of the recombinant yeast microsomes ranged from 
35-48 pmol/mg except for 2C8 (191 pmol/mg) and 2C19 (17 pmol/mg) . Control 
liver reactions contained 0.1 rag microsomal protein but were not fortified 
with reductase, cytochrome b 5 , or phospholipid and were incubated with 0.1 

45 M phosphate buffer (pH 7.4) . Values represent the means ± SE. 



a Activity significantly higher than that of control yeast microsomes, P < 
0.05. Analysis of variance and Fisher's Least Significant difference test. 

D 2C19 activity significantly higher than activities of all other 
recombinant CYP2C proteins or human liver microsomes, P < 0.05. 

c Human liver microsomes significantly higher than recombinant microsomes 
55 except 2C19, P < 0.05. 

d Significant difference between S- and R-Mephenytoin hydroxylase 
activities, P < 0.05. 



WO 95/30766 



PCT/US95/05744 



76 



CD 
01 

<0 ~- 
O 

>i in 
X «r 
0 Cu 
u >, 
TJ JJ rH 

-H C 
0) JJ ^ 
T) U C 

-H < -H 

E £ 
(0 ^ 



3 



O 
E 



to 

CO 

os 



CD 
CO 
CO 



— a: 
o 
in 



>1 

o 

£>,§ 

JJ c 

> b 

C JJ £ 

*H U \ 
O < rH 

JJ O 



>» 

c 

0) 

2 



E 
E 



U 

■o 

ro 
o 

00 
CM 



o 

ON 

ro 



a 



(0 
ON 
CM 



CM 



Q 
53 



(0 

CM Q 
S 

•H 



■H 



Q rH 

-H 



Q 
25 



CO 





CM 


o 




CM 




CM 




en 




CO 




CM 


«H 




O 




tH 




o 




o 




1-4 






H 








tH 






ro 






















o 


r> 




ro 


r> 


o 


^< 




10 


VD 




o 


o 




fH 


CM 


fH 


O 


tH 


iH 


o 












u 








0 






























ro 




CO 


ro 


o 






to 






ro 


ro 




in 


en 


in 




O 










o 




CN 


o 


o 


o 


ro 


© 


tH 




o 


o. 




o 


o 


o 


o 


o 


© 


O 




o 


o 




o 


o 


o 


o 


o 


o 


o 




-H 


-H 


Q 




41 


44 


M 


41 






1 




rO 






CO 




ro 


tH 


rH 






a\ 


V£> 






r> 


o 


CM 


in 


in 


00 




rH 


o 




tH 


o 


tH 


o 


o 


o 


CM 




o 


o 




O 


o 


O 


© 


o 


o 


o 







CO 


CO 




co 


a 








CO 




o 


tH 


ve- 


ON 


ro 


iH 


ro 




© 


fH 


O 


in 


O 


o 


o 


CM 


CN 


O 


o 


o 


o 


o 


o 


o 


© 


O 


o 


© 


o 


o 


© 


o 


o 


O 


-H 






41 


4H 


4* 


4t 


M 


41 


f> 


H 


vo 


cn 


ro 




CN 


ro 




tH 


GO 


rH 


tH 


o 


in 


ro 


ro 


vc- 




© 


tH 


o 


tH 


o 


o 


O 




VO 




















o 


o 


o 


O 


© 


o 


o 


© 



















u 


M 




10 
























ro 


















% 


t 


V© 


ON 






















iH 


CO 






















CQ 
























SC 


B 


















c 


e 














tH 
l 








ma 


ma 


es 


m 

CD 










u 


CJ 








3 


E 


E 












ON 








X 


O 


0 










CM 


CM 










01 


m 














VD 




UIO 


UIO 


ro 


ro 










in 


in 


in 




u 


u 


o 


u 






In 


in 


00 


GO 


CO 




d-i 


<u 




-H 


c 




u> 


CM 


ro 


ro 


ro 








s 


s 


o 








U 


U 


jj 




TJ 


TJ 






fH 
















<D 


CD 


U 


u 


V 




o\ 


ON 




e 


f 




•H 


«H 


CD 


CD 






in 


in 


CM 


CM 


CM 




tu 


IU 


> 


> 






ro 


ro 


a 


rH 


a 




iH 


•H 


•H 




cj 




cu 


3 


ro 


CO 








M 






CM 




tH 
M 


T.*» 
uc 


< 
i 


> 

i 


a 

i 




Pu 


3 
CU 


c 


c 


o 


on 


I 


i 


00 


CO 


00 








(0 


CO 


in 


vH 


o\ 


ON 


tH 


tH 


rH 


CO 


CO 


ON 


S 


E 




u 


CJ 


cj 


a 


CJ 


cj 


CJ 


CJ 


CJ 


3 




a> 


CM 


CM 


CM 


CM 


CM 


CM 


CN 


CM 


CM 


X 





CO 

c 

(0 

i 



0) 
01 

c 

X! ro 
JJ U 



CD JJ 
M 
cu 
» 

ON 
CJ 
CM 



_ *5 

c cd co jj 

CD TJ * 

co -H cd C 

CD E 0J -H 

^ ro a 
a jj co 
CD 3 jj m 
UA CJ » 

rH E 
W 0-h >, 
„ CD JJ JJ JJ 
TJ 3 CO «H 
OrH E-H > 

" (0 0 *0 -h 
> rH Jj 

STJ fi ro 
C — 0 
co in CD *H 
U T5 
CD -H C CO 
•O 4J *H Vh 
CD CO CM CO 
— O CM •O 
-H-H^TJ C 
U rH O 3 

u a*o jj o 
m-H a) o vj 

CD U JJ CO 01 
*0 JJ O 14 <&4 

m jj u 
c u & ro 

(0-h u 

Si CO 

T5*0 3 

CD CD CO JJ k 

EE O CD 

V-i 1h OJ C > 

O 0 tH CD 

Miw CD CD 3 

U U 3 fH 0 

CD CD CD JC 

© 

a cd in >, r o 

S n ^ co CD 

co CD 04 a > 

© ? tn n 

oo a> co oj 

CO CD «C CO 
rS4J CXI 
T3 CO -HO 
C D3 JJ O 
CO 01 QiU CD 
< CD >, Jh 
CO U C CD 
CD X OJ * 
S * CD JC 

o — a c*o 

CO fH CO CD -H CD 

O on jj E O C 

U OA C I JJ-H 
CJ fH CD CO >i E 

•Tjj C C W 
g * O CJ OJ CD 

u ax: £ jj 

JJ CD E jj a CD 

0 CD TJ 

CO CO O U E 

CD CO O >i4J 

^ »H OZ 
E T3 (0 m U ii 
O C CT3Q 
M (0 OlO >iS 

>i-H JJ I 
•O O C U * - 
CD 3 -H <0 rH 
-H CO (0 OJ Q 
«W JJ M <M E 
■rt w C OC 
MO* 
3 M CJ C 03 B 
a CD — CO CD-H 
> rH B E 
CJf-l 01 CQ -H^ 

U rH C JJ rH 

0) 0 O 
* C -h . C E 
co JJ en O B 
CO E O CD -H 
© 3 ro 3 JJ rH 
in J3 CD rH c© 
rH CO CD • 
CU E > 4J o 
CD 

JJ V4 B CD M 44 
C UH CO CO 

ro t-H to jj in 

C T3 CQ rH O CM 

•h CD bs CO © 

E mj • o a> o 

O-H U H , 
CJ IH CO T3 0) 
CD 3 >,.C(U 
Etf a 41JS U O 



CO 
CD 
CQ 
CD 

JJ 

c 

0) 

IH 

ro 

a 



© CO 

• o u 
in © cd 
in © - > 
o • © -H 

• © V rH 
© V Cu 

V — c 

CU ro 
cu co E 
-a 3 

« CO CM «C 
CO CD 

CD 3 tH JJ 

3rH O C 

rH <0 CD 

ro > - U 

> in cd 

C <M *U 
U -H IU 
CD O B-H 
J3 4J (0 *a 
U >iX ^ 

0 E JJ E 
0J o 

Hfi U H 
rH a CD *U 
CO CD 4J 

E «ST3 
C i CD CD 
COOS M > 
0)-H 
JJ E M 

O >iCD 
H W rH »0 
CD UH JJ 
JJ E CQ 
CO JJ CO C 
fl)CUO 
M 0J "H *H 

01 M UH JJ 
CD -H CO 

>t«u B U 

rH IU Ol CO 

jj-h -H a 

E*D n CD 
co u 
cj rS cd a 

HH U 
UH JJ CD On 
-H C > U 

E CO CM 

OIU O 
■H H C\ M 

CO IU CM CD 
-tH JC 

ON B TJ JJ 

UH MH JJ 

in o o 

O iu iu 
•>n>.00 

O JJ JJ 
V-H-H >,>, 

> > JJ JJ 
O* -H -H -H -H 

JJ JJ > > 
»UUHH 

GO CO CO JJ JJ 

CD U CJ 

3 CD CD CO CO 
rH CO CO 

CO CO CO CD CD 

> rH rH CO CQ 

J* S?KHH 

BOO >«>i 
ro M u X X 
rH «OT3 0 O 
XI >i >i VH M 
X JST3T3 
M >,>, 

o> e c jc x: 

> -W-H 

O O O CD CD 
JJ JJ TJ T3 

TJ >1 >t-^ -H 

CD B B E E 
CQ CD CD CO CO 
CO JC JC JJ JJ 

cd a a 3 3 
u cd cd ja ja 

C i i 0 0 
w CO CO H H 

CO JO CJT3 CJ 



WO 95/30766 



PCT/US95/05744 



77 

Recombinant CYP2C proteins were purified from yeast 
microsomes and their ability to 4 • -hydroxy late the S- and R- 
enantiomers of mephenytoin were also examined in a 
reconstituted system (Table IV) . 2C19 had similar turnover 
5 numbers for S-mephenytoin 4 • -hydroxylation in the 

reconstituted system and in recombinant yeast microsomes 
fortified with reductase. This turnover number was at least 
10-times higher than that of human liver microsomes, and it 
was 50-100 times higher than that of recombinant 2C9, 2C18 or 

10 2C8. The turnover number of recombinant 2C9 was -100 times 

higher than the activity of a preparation of 2C9 purified from 
human liver . 4 • -hydroxylation of mephenytoin by 2C19 was 
stereospecif ic for the S-enantiomer, while metabolism by 2C9 
was not stereospecif ic. Surprisingly, 2C18 appeared to be 

15 stereoselective for the R-enantiomer of mephenytoin. The 

turnover number of 2C19 for S-mephenytoin 4 1 -hydroxy lase was 
also -30 times higher than the turnover numbers reported for a 
preparation ^450^ purified from human liver by Srivastava et 
al., Wol. Pharmacol. 40:69-79 (1991) (0.21 nmol/min/nmol 

20 P450) . 

Although 2C9 exhibits poor catalytic activity toward 
S-mephenytoin, this cytochrome appears to be the principal 
tolbutamide hydroxylase (Table IV and V) . The turnover 
numbers for hydroxylation of tolbutamide by the purified 

25 recombinant 2C9 were somewhat lower than those of 2C9 purified 
form human liver in the absence of exogenous reductase. The 
lie 359 allele of 2C9 had a 3-fold higher turnover number for 
tolbutamide than the Leu 359 allele when activity of the 
recombinant microsomes were adjusted for P450 content 

30 (Table V) • 2C19 also appeared to metabolize tolbutamide at a 
rate comparable to that of 2C9, although this rate was 
difficult to estimate due to the low specific content of P450 
in the recombinant 2C19 yeast clone available at the time of 
these assays. The two alleles of 2C18 exhibited lower 

35 tolbutamide hydroxylase activity than 2C9 in recombinant yeast 
microsomes . 
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TABLE V 

Tolbutamide Hydroxylase Activities of 
Recombinant Human CYP2C Yeast Microsomes 



5 P450 Content Tolbutamide Hydroxylase Activity 

Microsomes (pmol/mg) (nmol/min/mg protein) (nraol/min/nmol P450) 



Control Yeast 


<5 


0.3 


± 


0.01 






2C9-Ile 359 (65) 


55 


169.8 


± 


7.4 a ' b 


3.4 ± 


0.15 


2C9-Leu 359 (25) 


20 


14.8 


± 


0.3 a ' c 


0.99 ± 


0.02 


2C8 


80 


8.5 


± 


0.2 a 


0.11 ± 


0.003 


2C18-Asp 2 Thr 385 (29c-la) 


53 


9.3 


± 


0.7 a 


0.19 ± 


0.02 


2C18-Asp 2 Met 385 (6b-9) 


34 


11.1 


± 


1.2 a 


0.37 ± 


0.04 


2C19 (lla-3) 


<7 


18.4 


± 


2.4 a ' d 


ND 




UC8936 Human Liver 
Microsomes 


227 


116 


± 


0.8 a 


2.3 ± 


0.02 



Tolbutamide hydroxylase activities measured as described in methods. 
Reaction mixtures contained 1 mg yeast microsomal protein or 0.2 mg UC8936 
20 human liver microsomal protein (50 pmol P450) . Purified P450 reductase 

(1,000 units) was included in reactions with yeast microsomes but not human 
microsomes. Values were the means ± SE. ND=Not calculated due to low 
specific content of 2C19 in yeast in this experiment. 



25 a Significantly higher than control yeast microsomes, P<0.05. : Pairwise 
comparisons using Fisher's Least Significant Difference test. 

b Clone 65 significantly higher than all other clones (P<0.0001) . 

c Clone 25 significantly greater than 2C8 (P<0.0005). 

d Clone lla significantly higher than 2C8 (P<0.0001) . 



30 The data show that CYP2C19 stereospecifically 

hydroxy lates s-mephenytoin at the 4 1 - position at a rate which 
is at least 10 times higher than the rate in human liver 
microsomes. This is the first example of a human CYP protein 
which metabolizes S-mephenytoin with a turnover number 

35 appreciably higher than that of human liver microsomes. Other 
2C proteins showed a 100-fold reduced activity relative to 
2C19. One of the 2C9 variants tested (lie 359 ) is identical to 
that reported by Yasumori et al., supra to show a low level of 
S-mephenytoiri 4 • -hydroxylase activity. The low rate of 4'- 

40 hydroxylation of S-mephenytoin by 2C9 detected in the present 
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study with high specific activity 14 C-labeled S-mephenytoin 
undoubtedly explains the conflicting reports from various 
laboratories concerning the ability of this cytochrome to 
metabolize mephenytoin (Yasumori et al. , supra; Srivastava et 
5 al., supra; Relling et al., supra). 

(d) Comparisons of Immunoblot Analysis of CYP2C 
Proteins in Human Livers with Liver Microsomal S-Mephenvtoin 
4 ' -Hydroxylase Activities 

Microsomes from 16 human liver donor samples 
10 previously assayed for S- and R-mephenytoin 4 1 -hydroxylase 
activities were analyzed for CYP2C proteins by Western blot 
analysis (Fig. 10) using an antibody to 2C8 and a polyclonal 
antibody to 2C9 and 2C19. Both 2C18 and 2C19 have mobilities 
similar to that of the low molecular weight band recognized in 
15 human microsomes by most antibodies to 2C9. However, an 

antibody to a 2C19 peptide was specific for 2C19. 2C18 could 
not be detected in human liver samples using a peptide 
antibody to 2C18 (-5 pmol detection limit) , indicating that 
this polypeptide is expressed poorly (<50 pmol/mg) . 
20 The 2C19 content of liver microsomes was consistent 

with their S-mephenytoin 4 • -hydroxy lase activities (Fig. 10). 
In particular, samples 129 and 130 had extremely low S- 
mephenytoin 4 1 -hydroxylase values, low S/R ratios, and 2C19 
appeared to be essentially absent in these microsomal samples. 
25 Densitometric analysis of immunoblots revealed that 2C19 
content of the 16 human liver microsomes correlated 
significantly with S-mephenytoin 4 1 -hydroxylase activity 
(r=0.718, P<0.005) (Fig. 11), but that the content of 2C9 did 
not correlate with this catalytic activity (r*=0.49, P>0.05). 
30 There was also a significant correlation between 2C8 content 
and S-mephenytoin 4 ■ -hydroxylase activity (r=0.82, P<0.0001). 
However, this correlation was probably fortuitous, because 2C8 
shows very low S-mephenytoin 4 1 -hydroxylase activity either in 
recombinant form or when purified from human liver. 
35 Alternatively, the correlation may indicate an indirect 
regulatory role for 2C8 in controlling S-mephenytoin 4'- 
hydroxylase activity. 
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fe> Sequences of 2C9 and 2C18 mRNAs in Livers with 
High or Low S-mephenytoin 4 1 -Hydroxylase Activities 

2C18 and 2C9 mRNAs from six of the above livers were 
5 amplified by PGR and directly sequenced through areas of known 
allelic variation to determine whether there was a 
relationship between S-mephenytoin 4 1 -hydroxylase activity and 
the presence of a particular allelic variant (Table VI) . When 
the total 2C18 PCR products were sequenced, the two 

10 individuals with the highest S-mephenytoin hydroxylase 
activity were homozygous for Thr 385 (ACG). Of the two 
individuals with the lowest activity, one was homozygous for 
Met 385 , and one was heterozygous for Thr/Met 385 (AC/TG) . Two 
individuals with intermediate activity were also homozygous 

15 for Thr 385 . Similarly, when 2C9 mRNA from these same 

individuals was amplified and sequenced through known allelic 
variations, sample 108 (low S-mephenytoin 4 1 -hydroxylase 
activity) was heterozygous at C/T 430 (coding for Cys/Arg 144 ), 
while the other five individuals were homozygous for C 430 

20 (Arg 144 ) . Sequencing samples through bases 1072-1077, all 
samples except for 106 (high activity) read 1072 TACATT 1077 , 
coding for Tyr 358 Ile 359 . Sample 106 read TACA/CTT indicating 
that it was heterozygous for Ile/Leu 359 . These data indicate 
that there is no relationship between S-mephenytoin 4 1 - 

25 hydroxylase activity of human liver microsomes and the 

identity of the allelic variants of 2C18 (Thr/Met 385 ) or 2C9 
(Arg/Cys 144 , Tyr/Cys 358 , Ile/Leu 359 ) in these tissues. 
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TABLE VI 

Alleles in Human Livers with Varying S-Mephenytoin 
4' -Hydroxylase Phenotype3 



S-MPOHase 



5 


Pheno- 
type 


nmol/ 
min/mg 


Liver 
donor 


2C18 
allele 


2C9 allele 










High 


0.286 


106 


Thr 385 


Arg 144 


His 276 


Tyr 358 


Ile/Leu 31 




High 


0.351 


115 


Tftr 385 


Arg 144 


HiS 27€ 


Tyr 358 


Ile359 


10 


Inter- 
mediate 


0-070 


118 


Thr 385 


Arg 144 


His 276 




Leu359 




Inter- 
mediate 


0.081 


123 


Thr 385 


Arg* 44 


His 27 * 


Tyr358 


Ile359 




Low 


0.051 


108 


Thr /Met 385 


Arg/Cys 144 


His 276 


Tyr 358 


Ile 359 


15 


Low 


0.025 


129 


Met /Met 3 8 5 


Arg 144 


His 276 




lie 359 



4 . conclusion 

These results show that 2C19 has a turnover number 
for the 4 1 -hydroxy lation of S-mephenytoin about 100-fold 

20 higher than that of 2C9, 2C18, or 2C8. 2C19 hydroxylation was 
stereospecific for the S- enantiomer. The hepatic content of 
2C19 in 16 liver microsomal samples correlated with their S- 
mephenytoin 4 ' -hydroxylase activities. 2C9 appeared to be the 
primary tolbutamide hydroxylase, although 2C19 may also 

25 contribute to this catalytic activity. The identity of the 

allelic variant of 2C9 or 2C18 did not influence S-mephenytoin 
4 1 -hydroxy lase activity. These data strongly indicate that 
2C19 is the key determinant of S-mephenytoin 4 1 -hydroxylase 
activity in human liver. 

30 Example 6: Diagnostic As says for Detecting Individuals 
Deficient in s-Menhenv toin 4 ' -Hydroxylase Activity 

Individuals deficient in S-mephenytoin 4 1 - 
hydroxylase activity are identified by comparing analysis of 
their genomic or cDNA encoding 2C19. 
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(a) Analysis of full-length cDNA 
Liver microsomes were prepared by standard 
differential centrifugation methods (2) from human liver 
samples previously characterized as varying markedly in S- 
5 mephenytoin 4 1 -hydroxylase in vitro. Total liver RNA was 
isolated from the liver samples with trireagent (Molecular 
Research Center, Inc.) and reversed transcribed using random 
hexamers as 3' primers. Overlapping CYP2C19 cDNA fragments 
from five human liver samples that showed poor metabolism of 

10 S-mephenytoin in vitro were amplified by the polymerase chain 
reaction (PCR) . PCR was performed on an aliquot of the cDNA 
in 1 X PCR buffer (67 mM Tris-HCl pH 8,8, 17 mM (NH 4 ) 2 S0 4 , 10 
mM j3-mercaptoethanol , 7 /xM EDTA, 0.2 mg bovine serum 
albumin/ml), 50 fM dATP, dCTP, dGTP and dTTP, 0.25 fM of both 

15 PCR primers, 2.5 U AmpliTaq DNA polymerase (Perkin Elmer 
Cetus) and 1.0 mM MgCl 2 . The PCR conditions were: initial 
denaturation at 94°C for 3 min; 35 cycles consisting of: 
denaturation at 94°C for 30 sec, annealing at 53°C for 30 sec 
and extension at 72°C for 30 sec; final extension at 72°C for 

20 10 min; using a Perkin Elmer thermocycler. PCR products (20 
pi) were analyzed on 3% agarose gels stained with ethidium 
bromide. 

The PCR fragments were purified using Microcon 
filters (Amicon Inc.) and used in the cycle sequencing 

25 reaction employing fluorescence-tagged dye terminators (PRISM, 
Applied Biosystems) ed and sequenced. One partial CYP2C19 cDNA 
was isolated which exhibited aberrant splicing of exon 5 (Fig. 
12) . This cDNA was missing the initial 40 bases of exon 5, 
and was also missing a Smal site (Fig. 12) . This deletion 

30 would be predicted to produce an early stop codon resulting in 
a truncated defective protein. 



(b) Rapid Assay for Identifying 40 bo Deletion in 

CDNA 

The analysis of full-length cDNAs identified a 40 bp 
35 deletion as a likely cause of S-mephenytoin 4 1 -hydroxylase 

activity deficiency. A rapid assay was therefore devised to 
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analyze the specific region of a 2C19 cDNA molecule spanning 
the 40 bp deletion. 

Specific PCR primers were designed to amplify the 
region of the CYP2C19 cDNA spanning the deletion (Figs. 12 and 
5 13). mKNA from 13 human livers previously characterized for 
extensive or poor metabolism of S-mephenytoin in vitro were 
reverse transcribed and amplified by PCR. Liver samples with 
the highest S-mephenytoin hydroxylase activity contained only 
the normally spliced mRNA. By contrast, sample 35 (a probable 
10 poor metabolizer) produced an amplification product containing 
the 40 bp deletion. Samples with intermediate S-mephenytoin 
4 '-hydroxylase activity and low amounts of CYP2C19 protein 
exhibited both the normal 2C19 cDNA and 2C19 cDNA containing 
the 40 bp deletion. 

15 fc) Genomic Sequencing of 2C19 

Because human tissue samples containing genomic 2C19 
DNA are much more easily obtained than samples containing 2C19 
mRNA, it is preferable to diagnose a polymorphic defect from 
genomic DNA. Genomic DNA was isolated from the blood of human 

20 volunteers previously characterized as poor or extensive 

metabolizers of S-mephenytoin in vivo. The in vivo phenotype 
of most Swiss subjects was based on a hydroxy lat ion index, 
with a value above 5.6 identifying a poor metabolizer (Kupfer 
et al., Eur. J. Clin. Pharmacol. 26:753-759 (1984)). The in 

25 vivo phenotype of American, Oriental and one Swiss subject was 
based on the urinary S/R ratio (Wedlund et al., Clin. 
Pharmacol. Ther. 36:773-780 (1984)) — a poor metabolizer (PM) 
being defined as having a ratio > 0.95. An extensive 
metabolizer is defined as having a ratio < 0.8. Ah 

30 intermediate phenotype (IM) has been previously described with 
the extent of 4 1 -hydroxy lation being greater than in PMS but 
with the rate of metabolite formation being slower than EMS 
(Arns et al., Pharmacologist 32:140 (1990)). 

It was believed that the 40 bp deletion identified 

35 in 2C19 cDNA occurred in exon 5, near the border with intron 4 
based on a comparison of the gene structure of CYP2C9 and 
CYP2C18 (de Morais et al. , supra). Thus, a segment of genomic 
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2C19 DNA across the intron 4/exon 5 border was amplified to 
identify the corresponding genetic defect in genomic DNA. In 
the initial assays, the untranslated regions of the genomic 
2C19 sequence were not known. However, intron 4 primers could 
5 be designed based on the corresponding sequences from CYP2C9 , 
which are expected to show about 95% sequence identity based 
on comparison with partial genomic sequences of 2C19. The 
primer for exon 5 was based on the cONA sequence of CYP2C19 
(see Example 1) . The amplified DNA fragment was found to have 

10 the same size in both poor and extensive metabolizers. 

However, on restriction analysis, it was found that only the 
fragment from extensive metabolizers could be digested with 
Smal. The amplified DNA fragment was sequenced in extensive 
and poor metabolizers. 

15 Provision of genomic 2C19 DNA sequence in the intron 

4 region, allowed the design of a specific intron primer 
exhibiting perfect complementarity to the 2C19 DNA sequence in 
subsequent experiments. The forward PCR primer from intron 4 
was 5 ' -AATTACAACCAGAGCTTGGC-3 ' and the reverse primer from 

20 exon 5 was 5 '-TATCACTTTCCATAAAAGCAAG-3 ' . The forward primer 
anneals 81 bp upstream of the intron 4/exon 5 junction. PCR 
conditions were as for amplification of cDNA except that 
reactions used 200 ng of genomic DNA and an initial 
denaturation at 96°C for 5 min. PCR products were restricted 

25 with Smal in the PCR buffer, without purification. Uncut 

products had the same size (168 bp) in all samples. Digested 
PCR products were analyzed on 4% agarose gels stained with 
ethidium bromide. 

DNA from 18 unrelated Caucasian extensive 

30 metabolizers and 10 unrelated Caucasian poor metabolizers was 
analyzed by this strategy. (Fig 14C) . All extensive 
metabolizers were either homozygous or heterozygous for the 
normal CYP2C19 gene, defined here as CYP2C19 wt (wild type). 
Among the 10 poor metabolizers, 7 were homozygous for the 

35 defective gene, defined as CYP2C19 m (poor mephenytoin 

hydroxylation) . One poor metabolizer was heterozygous 
(CYP2Cl9 wt /CYP2C19 m ) , and two were homozygous 
(CYP2C19 wt /CYP2C19 wt ) , indicating that CYP2Cl9 m accounted for 
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15 of 20 alleles tested (75%) in Caucasian poor metabolizers. 
The presence of 5 CYP2C19 wt alleles in poor metabolizers 
suggests that additional mutations may exist in the Caucasian 
population, but that 2C19 m represents the predominant defect. 
5 Segments of DNA spanning the intron 4/exon 5 

boundary were also amplified from 17 unrelated Oriental 
subjects. Figure 14D shows that 10/17 Oriental poor 
metabolizers are homozygous for CYP2C19 m , and CYP2C19 m 
accounts for 25 of 34 alleles (74%) in Oriental poor 

10 metabolizers. All 12 unrelated Oriental extensive 

metabolizers were either homozygous or heterozygous for the 
CYP2C19 wt gene. Thus, the major mutation responsible for the 
poor metabolizer phenotype in Oriental is identical to that 
found in Caucasians. 

15 The inheritance of CYP2C19 m in one Oriental family 

previously characterized with respect to the PM trait was also 
examined. Figure 14B shows that the poor metabolizer proband 
(arrow) and two other related poor metabolizers are homozygous 
for CYP2C19 m . Two individuals identified earlier as obligate 

20 heterozygotes (family C) (Ward et al., Clin. Pharmacol. Ther. 
42:96-99 (1987)) were indeed found to be CYP2C19 m /CYP2C19 wt . 
Thus, the inheritance of the genotype agrees with the 
Mendelian autosomal -recessive inheritance of phenotype. 

The DNA of three individuals (CYP2C19 wt /CYP2C19 wt , 

25 CYP2C19 m /CYP2C19 mi and CYP2C19 wt /CYP2C19 m ) was amplified as 
described above and sequenced directly using an automated 
sequencer (Applied Biosystems) (Fig. 15) . Surprisingly, the 
sequence of intron 4 of the defective gene was identical to 
that of the normal gene. The only alteration found in 

30 CYP2C19 m was a &*A change in exon 5 corresponding to position 
681 of the cDNA . This mutation introduces a cryptic splice 
site in this exon. This mutation also abolishes a Smal site at 
this position (CCCGGG -* CCCAGG) . The cryptic splice site 
shows slightly greater sequence identity to the consensus 

35 sequence for mammalian splice sites (Green, Ann. Rev. Cell 

Biol. 7:559-599 (1991)) than the normal splice site. A second 
potential branch point is also seen near the cryptic splice 
site. Surprisingly, the cDNA sequences from CYP2C8 and 
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CYP2C18 have a comparable potential cryptic splice site at the 
same point in exon 5 to that of CYP2C19 m , but the presence of 
the full-length 2C8 protein on immunoblots of human liver 
microsomes indicates that the majority of this protein is 
5 spliced correctly. 

Three of the samples tested by cDNA analysis in 
Figure 13 (sample 13, predicted genotype CYP2C19 wt /CYP2C19 wt ) , 
sample 21, predicted genotype CYP2C19 wt /CYP2C19 mf and sample 
35, predicted genotype CYP2C19 m /CYP2C19 m ) were retested by 

10 genomic analysis. Perfect agreement was observed. The 

cryptic splice site appeared to be used exclusively in sample 
35 which is a predicted poor metabolizer and also in liver RNA 
of an additional CYP2C19 m /CYP2C19 m individual. The selection 
of the cryptic splice site results in the absence of CYP2C19 

15 in liver microsomes from poor metabolizers (Fig. 13) . 

(&) Conclusion 

The principal genetic defect (CYP2C19 m ) which is 
responsible for the poor metabolism of S-mephenytoin is a G-A 
mutation at position 681 of the coding sequence (within exon 

20 5) . CYP2C19 m accounts for 75% of the defective alleles in 
both Caucasian and Oriental poor metabolizers. The single 
base change generates a cryptic internal splice site, which is 
used exclusively to produce an aberrantly spliced mRNA 
containing a 40 bp deletion. The CYP2C19 protein is virtually 

25 absent in livers of poor metabolizers. The mutation at 

position 681 is easily detected by PCR amplification of a 
segment of genomic. 2C19 DNA spanning the mutation. 

Example 7; Identification and Diagnostic Assay for a Second 
Polymorphism (designated 636) in 2C19 

30 A second mutation designated the 636 polymorphism 

(also known as CYP2C19 m2 ) has identified. Genomic DNA from a 
Oriental poor metabolizer (subject 43 in Example 6) was 
amplified by PCR using a forward primer complementary to the 
antisense strand of intron 3 extending from bases -79 to -55 

35 and a reverse primer complementary to the sense strand 

extending from 79-89 bases into intron 4 (forward primer 5 1 - 
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TATTATCTGTTAACTAATATGA-3 1 (SEQ. ID. No. 57) and reverse primer 
5»- ACTTCAGGGCTTGGTCAATA-3 ' (SEQ. ID. No. 58). These primers 
were selected to amplify a 329 base pair product containing 
all of exon 4 and the surrounding intron/exon junctions. See 
5 Figure 17. Sequencing of the PCR products with an Applied 

Biosystems sequencer identified two mutations in exon 4 of the 
Oriental poor metabolizer. A second mutation at nucleotide 
636 entailed a G-*A transition at the nucleotide level and the 
conversion of a tryptophan codon at position 212 (TGG-»TGA) to 

Id a premature stop codon. This change would result in a 

truncated 211 amino acid polypeptide containing only the first 
4 exons, which would not contain the heme-binding region and 
would be inactive. The change at position 636 also destroys a 
BamHI site ( GGATCC-*GAATCC) (or its isoschizomer BstI) at 

15 positions 635-640. 

A PCR test was developed using the primers described 
above to amplify a 329 base pair product. The PCR product 
from the wild-type DNA from extensive metabolizers was cut 
with BamHI to yield two expected fragments with sizes of 233 

20 base pairs and 96 base pairs (Fig. 18) . The PCR fragment 
amplified from the individual with the 636 mutation, (i.e. , 
Oriental subject #43) could not be restricted, indicating that 
he was homozygous for the 636 mutation. Genotyping of 7 
Oriental poor metabolizers whose phenotype could not be 

25 explained by the previous 681 mutation indicated that subjects 
41 and 43 were homozygous for the 636 mutation, while subjects 
36, 48, 11, 69, and 100, were heterozygous for bearing both 
636 and 681 mutant alleles. The DNA in homozygous 636 mutant 
subjects 41 and 43 was not cut by BamHI. The DNA in the 

30 heterozygotes yielded three bands at 327, 232, and 95 bp. The 
DNA from these heterozygotes also yielded three bands from 
Smal site (169, 120, and 49 bp) indicating they were also 
heterozygous for the 681 base pair mutation named CYP2C19 m ) . 
These data show that the 636 and 681 mutations completely 

35 account for the low phenotypes in all of the Oriental poor 

metabolizers of S-mephenytoin tested (17 individuals with 34 
alleles) . 
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Three Caucasian poor metabolizers who were not 
homozygous for the 681 mutation were also genotyped for the 
636 mutation. These were subjects J0B1, 502 and 503. One of 
these individuals (JOB1) was heterozygous for the 681 mutation 
5 while the other two did not contain the 681 mutation in either 
allele. None of these individuals exhibited a 636 mutation. 
Thus, there is probably at least one additional polymorphism 
in 2C19 in Caucasians. 

In summary, the 681 and 636 mutations explain 100% 
10 of Oriental poor metabolizers, and the 681 mutation alone 
accounts for about 75% of Caucasian poor metobilizers. 

While the foregoing invention has been described in 
some detail for purposes of clarity and understanding, it will 
be clear to one skilled in the art from a reading of this 

15 disclosure that various changes in form and detail can be made 
without departing from the true scope of the invention. All 
publications and patent documents cited in this application 
are incorporated by reference in their entirety for all 
purposes to the same extent as if each individual publication 

20 or patent document were so individually denoted. 
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(A) LENGTH: 490 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

Met Asp Pro Phe Val Val Leu Val Leu Cys Leu Ser Cys Leu Leu Leu 
15 10 15 

Leu Ser He Trp Arg Gin Ser Ser Gly Arg Gly Lys Leu Pro Pro Gly 
20 25 30 

Pro Thr Pro Leu Pro Val He Gly Asn He Leu Gin He Asp He Lys 
35 40 45 

Asp Val Ser Lys Ser Leu Thr Asn Leu Ser Lys He Tyr Gly Pro Val 
50 55 60 

Phe Thr Leu Tyr Phe Gly Leu Glu Arg Met Val Val Leu His Gly Tyr 
65 70 75 B0 

Glu Val Val Lys Glu Ala Leu He Asp Leu Gly Glu Glu Phe Ser Gly 
85 90 95 

Arg Gly His Phe Pro Leu Ala Glu Arg Ala Asn Arg Gly Phe Gly He 
100 105 HO 

Val Phe Ser Asn Gly Lys Arg Trp Lys Glu He Arg Arg Phe Ser Leu 
115 120 125 

Met Thr Leu Arg Asn Phe Gly Met Gly Lys Arg Ser He Glu Asp Arg 
130 135 140 

Val Gin Glu Glu Ala Arg Cys Leu Val Glu Glu Leu Arg Lys Thr Lys 
145 150 155 160 

Ala Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys Ala Pro Cys Asn 
165 170 175 

Val He Cys Ser He He Phe Gin Lys Arg Phe Asp Tyr Lys Asp Gin 
180 185 190 

Gin Phe Leu Asn Leu Met Glu Lys Leu Asn Glu Asn He Arg He Val 
195 200 205 

Ser Thr Pro Trp He Gin He Cys Asn Asn Phe Pro Thr He He Asp 
210 215 220 

Tyr Phe Pro Gly Thr His Asn Lys Leu Leu Lys Asn Leu Ala Phe Met 
225 230 235 240 

Glu Ser Asp He Leu Glu Lys Val Lys Glu His Gin Glu Ser Met Asp 
245 250 255 

He Asn Asn Pro Arg Asp Phe He Asp Cys Phe Leu He Lys Met Glu 
260 265 270 

Lys Glu Lys Gin Asn Gin Gin Ser Glu Phe Thr He Glu Asn Leu Val 
275 280 285 

He Thr Ala Ala Asp Leu Leu Gly Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 
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Thr Leu Arg Tyr Ala Leu Leu Leu Leu Leu Lys His Pro Glu Val Thr 
305 310 315 320 

Ala Lys Val Gin Glu Glu He Glu Arg Val He Gly Arg Asn Arg Ser 
325 330 335 

Pro Cys Met Gin Asp Arg Gly His Met Pro Tyr Thr Asp Ala Val Val 
340 345 350 

His Glu Val Gin Arg Tyr He Asp Leu He Pro Thr Ser Leu Pro His 
355 360 365 

Ala Val Thr Cys Asp Val Lys Phe Arg Asn Tyr Leu He Pro Lvs Glv 
370 375 380 

Thr Thr He Leu Thr Ser Leu Thr Ser Val Leu His Asp Asn Lys Glu 
385 390 395 400 

Phe Pro Asn Pro Glu Met Phe Asp Pro Arg His Phe Leu Asp Glu Glv 
405 410 4i5 



Gly Asn Phe Lys Lys Ser Asn Tyr Phe Met Pro Phe Ser Ala Gly Lys 
420 425 430 

Arg He Cys Val Gly Glu Gly Leu Ala Arg Met Glu Leu Phe Leu Phe 
435 440 445 

Leu Thr Phe He Leu Gin Asn Phe Asn Leu Lys Ser Leu He Asp Pro 
450 455 460 

Lys Asp Leu Asp Thr Thr Pro Val Val Asn Gly Phe Ala Ser Val Pro 
465 470 475 



460 



Pro Phe Tyr Gin Leu Cys Phe He Pro Val 
485 490 
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(2) INFORMATION FOR SEQ ID N0;2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1746 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2: 

CTTCAATGGA TCCTTTTGTG GTCCTTGTGC TCTGTCTCTC ATGTTTGCTT CTCCTTTCAA 60 

TCTGGAGACA GAGCTCTGGG AGAGGAAAAC TCCCTCCTGG CCCCACTCCT CTCCCAGTGA 120 

TTGGAAATAT CCTACAGATA GATATTAAGG ATGTCAGCAA ATCCTTAACC AATCTCTCAA 18 0 

AAATCTATGG CCCTGTGTTC ACTCTGTATT TTGGCCTGGA ACGCATGGTG GTGCTGCATG 24 0 

GATATGAAGT GGTGAAGGAA GCCCTGATTG ATCTTGGAGA GGAGTTTTCT GGAAGAGGCC 300 

ATTTCCCACT GGCTGAAAGA GCTAACAGAG GATTTGGAAT CGTTTTCAGC AATGGAAAGA 360 

GATGGAAGGA GATCCGGCGT TTCTCCCTCA TGACGCTGCG GAATTTTGGG ATGGGGAAGA 420 

GGAGCATTGA GGACCGTGTT CAAGAGGAAG CCCGCTGCCT TGTGGAGGAG TTGAGAAAAA 480 

CCAAGGCTTC ACCCTGTGAT CCCACTTTCA TCCTGGGCTG TGCTCCCTGC AATGTGATCT 540 

GCTCCATTAT TTTCCAGAAA CGTTTCGATT ATAAAGATCA GCAATTTCTT AACTTGATGG 600 

AAAAATTGAA TGAAAACATC AGGATTGTAA GCACCCCCTG GATCCAGATA TGCAATAATT 660 

TTCCCACTAT CATTGATTAT TTCCCGGGAA CCCATAACAA ATTACTTAAA AACCTTGCTT 720 

TTATGGAAAG TGATATTTTG GAGAAAGTAA AAGAACACCA AGAATCGATG GACATCAACA 780 

ACCCTCGGGA CTTTATTGAT TGCTTCCTGA TCAAAATGGA GAAGGAAAAG CAAAACCAAC 840 

AGTCTGAATT CACTATTGAA AACTTGGTAA TCACTGCAGC TGACTTACTT GGAGCTGGGA 900 

CAGAGACAAC AAGCACAACC CTGAGATATG CTCTCCTTCT CCTGCTGAAG CACCCAGAGG 960 

TCACAGCTAA AGTCCAGGAA GAGATTGAAC GTGTCATTGG CAGAAACCGG AGCCCCTGCA 1020 

TGCAGGACAG GGGCCACATG CCCTACACAG ATGCTGTGGT GCACGAGGTC CAGAGATACA 1080 

TCGACCTCAT CCCCACCAGC CTGCCCCATG CAGTGACCTG TGACGTTAAA TTCAGAAACT 1140 
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ACCTCATTCC CAAGGGCACA ACCATATTAA CTTCCCTCAC TTCTGTGCTA CATGACAACA 1200 

AAGAATTTCC CAACCCAGAG ATGTTTGACC CTCGTCACTT TCTGGATGAA GGTGGAAATT 1260 

TTAAGAAAAG TAACTACTTC ATGCCTTTCT CAGCAGGAAA ACGGATTTGT GTGGGAGAGG 1320 

GCCTGGCCCG CATGGAGCTG TTTTTATTCC TGACCTTCAT TTTACAGAAC TTTAACCTGA 1380 

AATCTCTGAT TGACCCAAAG GACCTTGACA CAACTCCTGT TGTCAATGGA TTTGCTTCTG 1440 

TCCCGCCCTT CTATCAGCTG TGCTTCATTC CTGTCTGAAG AAGCACAGAT GGTCTGGCTG 1500 

CTCCTGTGCT GTCCCTGCAG CTCTCTTTCC TCTGGTCCAA ATTTCACTAT CTGTGATGCT 1560 

TCTTCTGACC CGTCATCTCA CATTTTCCCT TCCCCCAAGA TCTAGTGAAC ATTCAGCCTC 1620 

CATTAAAAAA GTTTCACTGT GCAAATATAT CTGCTATTCC CCATACTCTA TAATAGTTAC 1680 

ATTGAGTGCC ACATAATGCT GATACTTGTC TAATGTTGAG TTATTAACAT ATTATTATTA 1740 

AATAGA 1746 

(2) INFORMATION FOR SEQ ID NO:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 490 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Met Asp Ser Leu Val Val Leu Val Leu Cya Leu Ser Cys Leu Leu Leu 
15 10 15 

Leu Ser Leu Trp Arg Gin Ser Ser Gly Arg Gly Lys Leu Pro Pro Gly 
20 25 30 

Pro Thr Pro Leu Pro Val He Gly Asn He Leu Gin He Gly He Lys 
35 40 45 

Asp He Ser Lys Ser Leu Thr Asn Leu Ser Lys Val Tyr Gly Pro Val 
50 55 60 

Phe Thr Leu Tyr Phe Gly Leu Lys Pro He Val Val Leu His Gly Tyr 
65 70 75 80 

Glu Ala Val Lys Glu Ala Leu He Asp Leu Gly Glu Glu Phe Ser Gly 
85 90 95 

Arg Gly He Phe Pro Leu Ala Glu Arg Ala Asn Arg Gly Phe Gly He 
100 105 110 

Val Phe Ser Asn Gly Lys Lys Trp Lys Glu He Arg Arg Phe Ser Leu 
115 120 125 

Met Thr Leu Arg Asn Phe Gly Met Gly Lys Arg Ser He Glu Asp Arg 
130 135 140 

Val Gin Glu Glu Ala Arg Cys Leu Val Glu Glu Leu Arg Lys Thr Lys 
145 150 155 160 

Ala Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys Ala Pro Cys Asn 
165 170 175 
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Val lie Cys Ser lie lie Phe His Lys Arg Phe Asp Tyx Lys Asp Gin 
180 185 190 

Gin Phe Leu Asn Leu Met Glu Lys Leu Asn Glu Asn lie Lys He Leu 
195 200 205 

Ser Ser Pro Trp He Gin He Cys Asn Asn Phe Ser Pro He He Asp 
210 215 220 

Tyr Phe Pro Gly Thr His Asn Lys Leu Leu Lys Asn Val Ala Phe Met 
225 230 235 240 

Lys Ser Tyr lie Leu Glu Lys Val Lys Glu His Gin Glu Ser Met Asp 
245 250 255 

Met Asn Asn Pro Gin Asp Phe He Asp Cys* the Leu Met Lys Met Glu 
260 265 270 

Lys Glu Lys His Asn Gin Pro Ser Glu Phe Thr He Glu Ser Leu Glu 
275 280 285 

Asn Thr Ala Val Asp Leu Phe Gly Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 

Thr Leu Arg Tyr Ala Leu Leu Leu Leu Leu Lys His Pro Glu Val Thr 
305 310 315 320 

Ala Lys Val Gin Glu Glu lie Glu Arg Val lie Gly Arg Asn Arg Ser 
325 330 335 

Pro Cys Met Gin Asp Arg Ser His Met Pro Tyr Thr Asp Ala Val Val 
340 345 350 

His Glu Val Gin Arg Tyr Leu Asp Leu Leu Pro Thr Ser Leu Pro His 
355 360 365 

Ala Val Thr Cys Asp He Lys Phe Arg Asn Tyr Leu He Pro Lys Gly 
370 375 380 

Thr Thr He Leu He Ser Leu Thr Ser Val Leu His Asp Asn Lys Glu 
385 390 395 400 

Phe Pro Asn Pro Glu Met Phe Asp Pro His His Phe Leu Asp Glu Gly 
405 410 415 

Gly Asn Phe Lys Lys Ser Lys Tyr Phe Met Pro Phe Ser Ala Gly Lys 
420 425 430 

Arg He Cys Val Gly Glu Ala Leu Ala Gly Met Glu Leu Phe Leu Phe 
435 440 445 

Leu Thr Ser He Leu Gin Asn Phe Asn Leu Lys Ser Leu Val Asp Pro 
450 455 460 



Lys Asn Leu Asp Thr Thr Pro Val Val Asn Gly Phe Ala Ser Val Pro 
465 470 475 480 

Pro Phe Tyr Gin Leu Cys Phe lie Pro Val 
485 490 
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(2) INFORMATION FOR SEQ ID N0:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1854 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 



GAGAAGGCTT 


CAATGGATTC 


TCTTGTGGTC 


CTTGTGCTCT 


GTCTCTCATG 


TTTGCTTCTC 


60 


CTTTCACTCT 


GGAGACAGAG 


CTCTGGGAGA 


GGAAAACTCC 


CTCCTGGCCC 


CACTCCTCTC 


120 


CCAGTGATTG 


GAAATATCCT 


ACAGATAGGT 


ATTAAGGACA 


TCAGCAAATC 


CTTAACCAAT 


180 


CTCTCAAAGG 


TCTATGGCCC 


TGTGTTCACT 


CTGTATTTTG 


GCCTGAAACC 


CATAGTGGTG 


240 


CTGCATGGAT 


ATGAAGCAGT 


GAAGGAAGCC 


CTGATTGATC 


TTGGAGAGGA 


GTTTTCTGGA 


300 


AGAGGCATTT 


TCCCACTGGC 


TGAAAGAGCT 


AACAGAGGAT 


TTGGAATTGT 


TTTCAGCAAT 


360 


GGAAAGAAAT 


GGAAGGAGAT 


CCGGCGTTTC 


TCCCTCATGA 


CGCTGCGGAA 


TTTTGGGATG 


420 


GGGAAGAGGA 


GCATTGAGGA 


CCGTGTTCAA 


GAGGAAGCCC 


GCTGCCTTGT 


GGAGGAGTTG 


480 


AGAAAAACCA 


AGGCCTCACC 


CTGTGATCCC 


ACTTTCATCC 


TGGGCTGTGC 


TCCCTGCAAT 


540 


GTGATCTGCT 


CCATTATTTT 


CCATAAACGT 


TTTGATTATA AAGATCAGCA 


ATTTCTTAAC 


600 


TTAATGGAAA 


AGTTGAATGA 


AAACATCAAG 


ATTTTGAGCA GCCCCTGGAT 


CCAGATCTGC 


660 


AATAATTTTT 


CTCCTATCAT 


TGATTACTTC 


CCGGGAACTC ACAACAAATT 


ACTTAAAAAC 


720 


GTTGCTTTTA 


TGAAAAGTTA 


TATTTTGGAA 


AAAGTAAAAG AACACCAAGA 


ATCAATGGAC 


780 


ATGAACAACC 


CTCAGGACTT 


TATTGATTGC 


TTCCTGATGA AAATGGAGAA 


GGAAAAGCAC 


840 


AACCAACCAT 


CTGAATTTAC 


TATTGAAAGC 


TTGGAAAACA 


CTGCAGTTGA 


CTTGTTTGGA 


900 


GCTGGGACAG 


AGACGACAAG 


CACAACCCTG 


AGATATGCTC 


TCCTTCTCCT 


GCTGAAGCAC 


960 


CCAGAGGTCA 


CAGCTAAAGT 


CCAGGAAGAG 


ATTGAACGTG 


TGATTGGCAG 


AAACCGGAGC 


1020 


CCCTGCATGC 


AAGACAGGAG 


CCACATGCCC 


TACACAGATG 


CTGTGGTGCA 


CGAGGTCCAG 


1080 


AGATACCTTG 


ACCTTCTCCC 


CACCAGCCTG 


CCCCATGCAG 


TGACCTGTGA 


CATTAAATTC 


1140 


AGAAACTATC 


TCATTCCCAA 


GGGCACAACC 


ATATTAATTT 


CCCTGACTTC 


TGTGCTACAT 


1200 


GACAACAAAG 


AATTTCCCAA 


CCCAGAGATG 


TTTGACCCTC ATCACTTTCT 


GGATGAAGGT 


1260 


GGCAATTTTA 


AGAAAAGTAA 


ATACTTCATG 


CCTTTCTCAG 


CAGGAAAACG 


GATTTGTGTG 


1320 


GGAGAAGCCC 


TGGCCGGCAT 


GGAGCTGTTT 


TTATTCCTGA 


CCTCCATTTT 


ACAGAACTTT 


1380 


AACCTGAAAT 


CTCTGGTTGA 


CCCAAAGAAC 


CTTGACACCA 


CTCCAGTTGT 


CAATGGTTTT 


1440 


GCCTCTGTGC 


CGCCCTTCTA 


CCAGCTGTGC 


TTCATTCCTG 


TCTGAAGAAG 


AGCAGATGGC 


1500 


CTGGCTGCTG 


CTGTGCAGTC 


CCTGCAGCTC 


TCTTTCCTCT 


GGGGCATTAT 


CCATCTTTCA 


1560 


CTATCTGTAA 


TGCCTTTTCT 


CACCTGTCAT 


CTCACATTTT 


CCCTTCCCTG 


AAGATCTAGT 


1620 


GAACATTCGA 


CCTTCATTAC 


GGAGAGTTTC 


CTATGTTTCA 


CTGTGCAAAT 


ATATCTGCTA 


1680 
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TTCTCCATAC TCTGTAACAG TTGCATTGAC TGTCACATAA TGCTCATACT TATCTAATGT 1740 

TGAGTTATTA ATATGTTATT ATTAAATAGA GAAATATGAT TT&TGTATTA TAATTCAAAG 1800 

GCATTTCTTT TCTGCATGTT CTAAATAAAA AGCATTATTA TTTGCTGAAA AAAA 1854 

(2) INFORMATION FOR SEQ ID NO:5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 490 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:5: 

Met Asp Pro Ala Val Ala Leu Val Leu Cys Leu Ser Cys Leu Phe Leu 
15 10 15 

Leu Ser Leu Trp Arg Gin Ser Ser Gly Arg Gly Arg Leu Pro Ser Gly 
20 25 30 

Pro Thr Pro Leu Pro He He Gly Asn He Leu Gin Leu Asp Val Lys 
35 40 45 

Asp Met Ser Lys Ser Leu Thr Asn Phe Ser Lys Val Tyr Gly Pro Val 
50 55 60 

Phe Thr Val Tyr Phe Gly Leu Lys Pro He Val Val Leu His Gly Tyr 
65 70 75 80 

Glu Ala Val Lys Glu Ala Leu He Asp His Gly Glu Glu Phe Ser Gly 
85 90 95 

Arg Gly Ser Phe Pro Val Ala Glu Lys Val Asn Lys Gly Leu Gly He 
100 105 110 

Leu Phe Ser Asn Gly Lys Arg Trp Lys Glu He Arg Arg Phe Cys Leu 
115 120 125 

Met Thr Leu Arg Asn Phe Gly Met Gly Lys Arg Ser He Glu Asp Arg 
130 135 140 

Val Gin Glu Glu Ala Arg Cys Leu Val Glu Glu Leu Arg Lys Thr Asn 
145 150 155 160 

Ala Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys Ala Pro Cys Asn 
165 170 175 

Val He Cys Ser Val He Phe His Asp Arg Phe Asp Tyr Lys Asp Gin 
180 185 190 

Arg Phe Leu Asn Leu Met Glu Lys Phe Asn Glu Asn Leu Arg He Leu 
195 200 205 
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Ser Ser Pro Trp He Gin Val Cys Asn Asn Phe Pro Ala Leu He Asp 
210 215 220 



Tyr Leu Pro Gly Ser His Asn Lys He Ala Glu Asn Phe Ala Tyr He 
225 230 235 240 

Lys Ser Tyr Val Leu Glu Arg He Lys Glu His Gin Glu Ser Leu Asp 
245 250 255 

Met Asn Ser Ala Arg Asp Phe He Asp Cys Phe Leu He Lys Met Glu 
260 265 270 

Gin Glu Lys His Asn Gin Gin Ser Glu Phe Thr Val Glu Ser Leu He 
275 280 285 

Ala Thr Val Thr Asp Met Phe Gly Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 

Thr Leu Arg Tyr Gly Leu Leu Leu Leu Leu Lys Tyr Pro Glu Val Thr 
305 310 315 320 

Ala Lys Val Gin Glu Glu He Glu Cys Val Val Gly Arg Asn Arg Ser 
325 330 335 

Pro Cys Met Gin Asp Arg Ser His Met Pro Tyr Thr Asp Ala Val Val 
340 345 350 

His Glu He Gin Arg Tyr He Asp Leu Leu Pro Thr Asn Leu Pro His 
355 360 365 

Ala Val Thr Cys Asp Val Lys Phe Lys Asn Tyr Leu He Pro Lys Glv 
370 375 380 

Thr Thr He He Thr Ser Leu Thr Ser Val Leu His Asn Asp Lys Glu 
385 390 395 400 

Phe Pro Asn Pro Glu Met Phe Asp Pro Gly His Phe Leu Asp Lys Ser 
405 410 415 

Gly Asn Phe Lys Lys Ser Asp Tyr Phe Met Pro Phe Ser Ala Gly Lys 
420 425 430 

Arg Met Cys Met Gly Glu Gly Leu Ala Arg Met Glu Leu Phe Leu Phe 
435 440 445 

Leu Thr Thr He Leu Gin Asn Phe Asn Leu Lys Ser Gin Val Asp Pro 
450 455 460 

Lys Asp He Asp He Thr Pro He Ala Asn Ala Phe Gly Arg Val Pro 
465 470 475 480 



WO 95/30766 



PCT/US95/05744 



98 

Pro Leu Tyr Gin Leu Cys Phe lie Pro Val 
485 490 

(2) INFORMATION FOR SEQ ID NO : 6 : 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2009 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

GGCACCGGAA AGAACAAGAA AAAAGAACAC CTTATTTTTA TCTTCTTCAG TGAGCCAATG 60 

TTCATTCAAA AGAGAGATTA AAGTGCTTTT TGCTGACTAG TCACAGTCAG AGTCAGAATC 120 

ACAGGTGGAT TAGTAGGGAG TGTTATAAAA GCCTTGAAGT GAAAGCCCGC AGTTGTCTTA 180 

CTAAGAAGAG AAGCCTTCAA TGGATCCAGC TGTGGCTCTG GTGCTCTGTC TCTCCTGTTT 240 

G1TTCTCCTT TCACTCTGGA GGCAGAGCTC TGGAAGAGGG AGGCTCCCGT CTGGCCCCAC 300 

TCCTCTCCCG ATTATTGGAA ATATCCTGCA GTTAGATGTT AAGGACATGA GCAAATCCTT 360 

AACCAATTTC TCAAAAGTCT ATGGCCCTGT GTTCACTGTG TATTTTGGCC TGAAGCCCAT 420 

TGTGGTGTTG CATGGATATG AAGCAGTGAA GGAGGCCCTG ATTGATCATG GAGAGGAGTT 460 

TTCTGGAAGA GGAAGTTTTC CAGTGGCTGA AAAAGTTAAC AAAGGACTTG GAATCCTTTT 540 

CAGCAATGGA AAGAGATGGA AGGAGATCCG GCGTTTCTGC CTCATGACTC TGCGGAATTT 600 

TGGGATGGGG AAGAGGAGCA TCGAGGACCG TGTTCAAGAG GAAGCCCGCT GCCTTGTGGA 660 

GGAGTTGAGA AAAACCAATG CCTCACCCTG TGATCCCACT TTCATCCTGG GCTGTGCTCC 720 

CTGCAATGTG ATCTGCTCTG TTATTTTCCA TGATCGATTT GATTATAAAG ATCAGAGGTT 780 

TCTTAACTTG ATGGAAAAAT TCAATGAAAA CCTCAGGATT CTGAGCTCTC CATGGATCCA 840 

GGTCTGCAAT AATTTCCCTG CTCTCATCGA TTATCTCCCA GGAAGTCATA ATAAAATAGC 900 

TGAAAATTTT GCTTACATTA AAAGTTATGT ATTGGAGAGA ATAAAAGAAC ATCAAGAATC 960 

CCTGGACATG AACAGTGCTC GGGACTTTAT TGATTGTTTC CTGATCAAAA TGGAACAGGA 1020 

AAAGCACAAT CAACAGTCTG AATTTACTGT TGAAAGCTTG ATAGCCACTG TAACTGATAT 1080 

GTTTGGGGCT GGAACAGAGA CAACGAGCAC CACTCTGAGA TATGGACTCC TGCTCCTGCT 1140 

GAAGTACCCA GAGGTCACAG CTAAAGTCCA GGAAGAGATT GAATGTGTAG TTGGCAGAAA 1200 

CCGGAGCCCC TGTATGCAGG ACAGGAGTCA CATGCCCTAC ACAGATGCTG TGGTGCACGA 1260 

GATCCAGAGA TACATTGACC TCCTCCCCAC CAACCTGCCC CATGCAGTGA CCTGTGATGT 1320 

TAAATTCAAA AACTACCTCA TCCCCAAGGG CACGACCATA ATAACATCCC TGACTTCTGT 1380 

GCTGCACAAT GACAAAGAAT TCCCCAACCC AGAGATGTTT GACCCTGGCC ACTTTCTGGA 1440 

TAAGAGTGGC AACTTTAAGA AAAGTGACTA CTTCATGCCT TTCTCAGCAG GAAAACGGAT 1500 

GTGTATGGGA GAGGGCCTGG CCCGCATGGA GCTGTTTTTA TTCCTGACCA CCATTTTGCA 1560 
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GAACTTTAAC CTGAAATCTC AGGTTGACCC AAAGGATATT GACATCACCC CCATTGCCAA 1620 

TGCATTTGGT CGTGTGCCAC CCTTGTACCA GCTCTGCTTC ATTCCTGTCT GAAGAAGGGC 1680 

AGATAGTTTG GCTGCTCCTG TGCTGTCACC TGCAATTCTC CCTTATCAGG GCCATTAGCC 1740 

TCTCCCTTCT CTCTGTGAGG GATATTTTCT CTGACTTGTC AATCCACATC TTCCCATTCC 1800 

CTCAAGATCC AATGAACATC CAACCTCCAT TAAAGAGAGT TTCTTGGGTC ACTTCCTAAA 1860 

TATATCTGCT ATTCTCCATA CTCTGTATCA CTTGTATTGA CCACCACATA TGCTAATACC 1920 

TATCTACTGC TGAGTTGTCA GTATGTTATC ACTAGAAAAC AAAGAAAAAT GATTAATAAA 1980 

TGACAATTCA GAGCCAAAAA AAAAAAAAA 2009 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 490 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:7: 

Met Glu Pro Phe Val Val Leu Val Leu Cys Leu Ser Phe Met Leu Leu 
15 10 15 

Phe Ser Leu Trp Arg Gin Ser Cys Arg Arg Arg Lys Leu Pro Pro Gly 
20 25 30 

Pro Thr Pro Leu Pro lie lie Gly Asn Met Leu Gin lie Asp Val Lys 
35 40 45 

Asp lie Cys Lys Ser Phe Thr Asn Phe Ser Lys Val Tyr Gly Pro Val 

50 55 60 

Phe Thr Val Tyr Phe Gly Met Asn Pro lie Val Val Phe His Gly Tyr 
65 70 75 80 

Glu Ala Val Lys Glu Ala Leu lie Asp Asn Gly Glu Glu Phe Ser Gly 
85 90 95 

Arg Gly Asn Ser Pro lie Ser Gin Arg lie Thr Lys Gly Leu Gly lie 
100 105 110 

lie Ser Ser Asn Gly Lys Arg Trp Lys Glu lie Arg Arg Phe Ser Leu 
115 120 125 

Thr Asn Leu Arg Asn Phe Gly Met Gly Lys Arg Ser He Glu Asp Arg 
130 135 140 

Val Gin Glu Glu Ala His Cys Leu Val Glu Glu Leu Arg Lys Thr Lys 
145 150 155 160 

Ala Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys Ala Pro Cys Asn 
165 170 175 

Val He Cys Ser Val Val Phe Gin Lys Arg Phe Asp Tyr Lys Asp Gin 
180 185 190 
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Asn Phe Leu Thr Leu Met Lys Arg Phe Asn Glu Asn Phe Arg He Leu 
195 200 205 

Asn Ser Pro Trp He Gin Val Cys Asn Asn Phe Pro Leu Leu He Asp 
210 215 220 

Cys Phe Pro Gly Thr His Asn Lys Val Leu Lys Asn Val Ala Leu Thr 
225 230 235 240 

Arg Ser Tyr He Arg Glu Lys Val Lys Glu His Gin Ala Ser Leu Asp 
245 250 255 

Val Asn Asn Pro Arg Asp Phe Met Asp Cys Phe Leu He Lys Met Glu 
260 265 270 

Gin Glu Lys Asp Asn Gin Lys Ser Glu Phe Asn He Glu Asn Leu Val 
275 2B0 285 

Gly Thr Val Ala Asp Leu Phe Val Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 

Thr Leu Arg Tyr Gly Leu Leu Leu Leu Leu Lys His Pro Glu Val Thr 
305 310 315 320 

Ala Lys Val Gin Glu Glu He Asp His Val He Gly Arg His Arg Ser 
325 330 335 

Pro Cys Met Gin Asp Arg Ser His Met Pro Tyr Thr Asp Ala Val Val 
340 345 350 

His Glu He Gin Arg Tyr Ser Asp Leu Val Pro Thr Gly Val Pro His 
355 360 365 

Ala Val Thr Thr Asp Thr Lys Phe Arg Asn Tyr Leu He Pro Lys Gly 
370 375 3B0 

Thr Thr lie Met Ala Leu Leu Thr Ser Val Leu His Asp Asp Lys Glu 
385 390 395 400 

Phe Pro Asn Pro Asn lie Phe Asp Pro Gly His Phe Leu Asp Lys Asn 
405 410 415 

Gly Asn Phe Lys Lys Ser Asp Tyr Phe Met Pro Phe Ser Ala Gly Lys 
420 425 430 

Arg He Cys Ala Gly Glu Gly Leu Ala Arg Met Glu Leu Phe Leu Phe 
435 440 445 



Leu Thr Thr He Leu Gin Asn Phe Asn Leu Lys Ser Val Asp Asp Leu 
450 455 460 
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Lys Asn Leu Asn Thr Thr Ala Val Thr Lys Gly He Val Ser Leu Pro 
465 470 475 480 

Pro Ser Tyr Gin He Cys Phe He Pro Val 
465 490 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1829 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: CDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 



AATftfiAAPPT 


hp^ptp. i^^ii 1 ^'! *p p 




X wX LI Ul XXI 


ATGCTTCTCT 


TTTCACTCTG 


60 


RAftAPAftAftp 


1 vj XAuunuAA 


fir* a r p ptp p p 
ULxAAbL 1 www 


lLLlbUUULL. 


ACTCCTCTTC 


CTATTATTGG 


120 


AAATATP.PTA 


PAnATAfZATYS 


X XiiAouviUAx 


PT*PP1V TV TlTPT 

w re waaatct 


TTCACCAATT 


TCTCAAAAGT 


180 




f**\ vi'i^pp a ppp 




P7ATPTA JATPPP 

CAl LaAATL UL 


ATAGTGGTGT 


TTCATGGATA 


240 


TGAGGPAGTY3 


aaggaagp pp 


TPATTflATA A 


TnfZ'h.n.'h.nn.'^n 

X vUnunuunU 


X X X I w 1 ouAA 


GtAGGUAATTC 


300 


PPPAATATPT 


PAAAGAATTA 


PTA A A 12/2 A P*P 


XovaAAl UAI X 


X LUAw 1 LAA I G 


7\ 7S Ik M« hw 

GAAAGAGATG 


360 


GAAGGAGATP 


wwwwO 111LX 


PPPTPAPAAA 
www 1 LnLnnn 


111 uLvunA X 


TTTGGGATGG 


G GAAGAG GAG 


420 




ppiTpfiT*pp a aa 

VA7XV3X X UUiw 


AfJCJAAflPTPA 
nuunnUV* X w/i 


wXUwwX lUlU 


f27AP_P2\PT*FP7A 
IjtAoVaAo X XoA 


P7V Tl IV IV TV ^^Tk IV 

w^AAAAAGCAA 


480 


GGCTTCACCC 


TGTGATCCCA 


CTTTCATCCT 


GGGCTGTGCT 


CCCTGCAATG 


TGATCTGCTC 


540 


CGTTGTTTTC 


CAGAAACGAT 


TTGATTATAA 


AGATCAGAAT 


TTTCTCACCC 


TGATGAAAAG 


600 


ATTCAATGAA 


AACTTCAGGA 


TTCTGAACTC 


CCCATGGATC 


CAGGTCTGCA 


ATAATTTCCC 


660 


TCTACTCATT 


GATTGTTTCC 


CAGGAACTCA 


CAACAAAGTG 


CTTAAAAATG 


TTGCTCTTAC 


720 


ACGAAGTTAC 


ATTAGGGAGA 


AAGTAAAAGA 


ACACCAAGCA 


TCACTGGATG 


TTAACAATCC 


780 


TCGGGACTTT 


ATGGATTGCT 


TCCTGATCAA 


AATGGAGCAG 


GAAAAGGACA 


ACCAAAAGTC 


840 


AGAATTCAAT 


ATTGAAAACT 


TGGTTGGCAC 


TGTAGCTGAT 


CTATTTGTTG 


CTGGAACAGA 


900 


GACAACAAGC 


ACCACTCTGA 


GATATGGACT 


CCTGCTCCTG 


CTGAAGCACC 


CAGAGGTCAC 


960 


AGCTAAAGTC 


CAGGAAGAGA 


TTGATCATGT 


AATTGGCAGA 


CACAGGAGCC 


CCTGCATGCA 


1020 


GGATAGGAGC 


CACATGCCTT 


ACACTGATGC 


TGTAGTGCAC 


GAGATCCAGA 


GATACAGTGA 


1080 


CCTTGTCCCC 


ACCGGTGTGC 


CCCATGCAGT 


GACCACTGAT 


ACTAAGTTCA 


GAAACTACCT 


1140 


CATCCCCAAG 


GGCACAACCA 


TAATGGCATT 


ACTGACTTCC 


GTGCTACATG 


ATGACAAAGA 


1200 


ATTTCCTAAT 


CCAAATATCT 


TTGACCCTGG 


CCACTTTCTA 


GATAAGAATG 


GCAACTTTAA 


1260 


GAAAAGTGAC 


TACTTCATGC 


CTTTCTCAGC 


AGGAAAACGA 


ATTTGTGCAG 


GAGAAGGACT 


1320 


TGCCCGCATG 


GAGCTATTTT 


TATTTCTAAC 


CACAATTTTA 


C7VGAACTTTA 


ACCTGAAATC 


1380 


TGTTGATGAT 


TTAAAGAACC 


TCAATACTAC 


TGCAGTTACC 


AAAGGGATTG 


TTTCTCTGCC 


1440 
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ACCCTCATAC CAGATCTGCT TCATCCCTGT CTGAAGAATG CTAGCCCATC TGGCTGCTGA 1500 

TCTGCTATCA CCTGCAACTC TTTTTTTATC AAGGACATTC CCACTATTAT GTCTTCTCTG 1560 

ACCTCTCATC AAATCTTCCC ATTCACTCAA TATCCCATAA GCATCCAAAC TCCATTAAGG 1620 

AGAGTTGTTC AGGTCACTGC ACAAATATAT CTGCAATTAT TCATACTCTG TAACACTTGT 1680 

ATTAATTGCT GCATATGCTA ATACTTTTCT AATGCTGACT TTTTAATATG TTATCACTGT 1740 

AAAACACAGA AAAGTGATTA ATGAATGATA ATTTAGTCCA TTTCTTTTGT GAATGTGCTA 1800 

AATAAAAAGT GTTATTAATT GCTGGTTCA 1829 

(2) INFORMATION FOR SEQ ID NO:9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 490 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 

Met Asp Ser Leu Val Val Leu Val Leu Cys Leu Ser Cys Leu Leu Leu 
15 10 15 

Leu Ser Leu Trp Arg Gin Ser Ser Gly Arg Gly Lys Leu Pro Pro Gly 
20 25 30 

Pro Thr Pro Leu Pro Val He Gly Asn He Leu Gin He Gly He Lys 
35 40 45 

Asp He Ser Lys Ser Leu Thr Asn Leu Ser Lys Val Tyr Gly Pro Val 
50 55 60 

Phe Thr Leu Tyr Phe Gly Leu Lys Pro He Val Val Leu His Gly Tyr 
65 70 75 80 

Glu Ala Val Lys Glu Ala Leu He Asp Leu Gly Glu Glu Phe Ser Gly 
85 90 95 

Arg Gly He Phe Pro Leu Ala Glu Arg Ala Asn Arg Gly Phe Gly He 
100 105 110 

Val Phe Ser Asn Gly Lys Lys Trp Lys Glu He Arg Arg Phe Ser Leu 
115 120 125 

Met Thr Leu Arg Asn Phe Gly Met Gly Lys Arg Ser He Glu Asp Arg 
130 135 140 

Val Gin Glu Glu Ala Arg Cys Leu Val Glu Glu Leu Arg Lys Thr Lys 
145 150 155 160 

Ala Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys Ala Pro Cys Asn 
165 170 175 

Val He Cys Ser He He Phe His Lys Arg Phe Asp Tyr Lys Asp Gin 
180 185 190 

Gin Phe Leu Asn Leu Met Glu Lys Leu Asn Glu Asn He Lys He Leu 
195 200 205 
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Ser Ser Pro Trp lie Gin lie Cys Asn Asn Phe Ser Pro He He Asp 
210 215 • 220 

Tyr Phe Pro Gly Thr His Asn Lys Leu Leu Lys Asn Val Ala Phe Met 
225 230 235 240 

Lys Ser Tyr He Leu Glu Lys Val Lys Glu His Gin Glu Ser Met Asp 
245 250 255 

Met Asn Asn Pro Gin Asp Phe He Asp Cys Phe Leu Met Lys Met Glu 
260 265 270 

Lys Glu Lys His Asn Gin Pro Ser Glu Phe Thr He Glu Ser Leu Glu 
275 280 285 

Asn Thr Ala Val Asp Leu Phe Gly Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 

Thr Leu Arg Tyr Ala Leu Leu Leu Leu Leu Lys His Pro Glu Val Thr 
305 310 315 320 

Ala Lys Val Gin Glu Glu He Glu Arg Val He Gly Arg Asn Arg Ser 
325 330 335 

Pro Cys Met Gin Asp Arg Ser His Met Pro Tyr Thr Asp Ala Val Val 
340 345 350 

His Glu Val Gin Arg Tyr He Asp Leu Leu Pro Thr Ser Leu Pro His 
355 360 365 

Ala Val Thr Cys Asp He Lys Phe Arg Asn Tyr Leu He Pro Lys Gly 
370 375 380 

Thr Thr He Leu He Ser Leu Thr Ser Val Leu His Asp Asn Lys Glu 
385 390 395 400 

Phe Pro Asn Pro Glu Met Phe Asp Pro His His Phe Leu Asp Glu Gly 
405 410 415 

Gly Asn Phe Lys Lys Ser Lys Tyr Phe Met Pro Phe Ser Ala Gly Lys 
420 425 430 

Arg He Cys Val Gly Glu Ala Leu Ala Gly Met Glu Leu Phe Leu Phe 
435 440 445 

Leu Thr Ser He Leu Gin Asn Phe Asn Leu Lys Ser Leu Val Asp Pro 
450 455 460 

Lys Asn Leu Asp Thr Thr Pro Val Val Asn Gly Phe Ala Ser Val Pro 
465 470 475 480 



Pro Phe Tyr Gin Leu Cys Phe He Pro Val 
485 490 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1852 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

GAAGGCTTCA ATGGMTCTC TTGTGGTCCT TGTGCTCTGT CTCTCATGTT TGCTTCTCCT 60 

TTCACTCTGG AGACAGAGCT CTGGGAGAGG AAAACTCCCT CCTGGCCCCA CTCCTCTCCC 120 

AGTGATTGGA AATATCCTAC AGATAGGTAT TAAGGACATC AGCAAATCCT TAACCAATCT 180 

CTCAAAGGTC TATGGCCCTG TGTTCACTCT GTATTTTGGC CTGAAACCCA TAGTGGTGCT 240 

GCATGGATAT GAAGCAGTGA AGGAAGCCCT GATTGATCTT GGAGAGGAGT TTTCTGGAAG 300 

AGGCATTTTC CCACTGGCTG AAAGAGCTAA CAGAGGATTT GGAATTGTTT TCAGCAATGG 360 

AAAGAAATGG AAGGAGATCC GGCGTTTCTC CCTCATGACG CTGCGGAATT TTGGGATGGG 420 

GAAGAGGAGC ATTGAGGACC GTGTTCAAGA GGAAGCCCGC TGCCTTGTGG AGGAGTTGAG 480 

AAAAACCAAG GCCTCACCCT GTGATCCCAC TTTCATCCTG GGCTGTGCTC CCTGCAATGT 540 

GATCTGCTCC ATTATTTTCC ATAAACGTTT TGATTATAAA GATCAGCAAT TTCTTAACTT 600 

AATGGAAAAG TTGAATGAAA ACATCAAGAT TTTGAGCAGC CCCTGGATCC AGATCTGCAA 660 

TAATTTTTCT CCTATCATTG ATTACTTCCC GGGAACTCAC AACAAATTAC TTAAAAACGT 720 

TGCTTTTATG AAAAGTTATA TTTTGGAAAA AGTAAAAGAA CACCAAGAAT CAATGGACAT 780 

GAACAACCCT CAGGACTTTA TTGATTGCTT CCTGATGAAA ATGGAGAAGG AAAAGCACAA 840 

CCAACCATCT GAATTTACTA TTGAAAGCTT GGAAAACACT GCAGTTGACT TGTTTGGAGC 900 

TGGGACAGAG ACGACAAGCA CAACCCTGAG ATATGCTCTC CTTCTCCTGC TGAAGCACCC 960 

AGAGGTCACA GCTAAAGTCC AGGAAGAGAT TGAACGTGTG ATTGGCAGAA ACCGGAGCCC 1020 

CTGCATGCAA GACAGGAGCC ACATGCCCTA CACAGATGCT GTGGTGCACG AGGTCCAGAG 1080 

ATACATTGAC CTTCTCCCCA CCAGCCTGCC CCATGCAGTG ACCTGTGACA TTAAATTCAG 1140 

AAACTATCTC ATTCCCAAGG GCACAACCAT ATTAATTTCC CTGACTTCTG TGCTACATGA 1200 

CAACAAAGAA TTTCCCAACC CAGAGATGTT TGACCCTCAT CACTTTCTGG ATGAAGGTGG 1260 

CAATTTTAAG AAAAGTAAAT ACTTCATGCC TTTCTCAGCA GGAAAACGGA TTTGTGTGGG 1320 

AGAAGCCCTG GCCGGCATGG AGCTGTTTTT ATTCCTGACC TCCATTTTAC AGAACTTTAA 1380 

CCTGAAATCT CTGGTTGACC CAAAGAACCT TGACACCACT CCAGTTGTCA ATGGATTTGC 1440 

CTCTGTGCCG CCCTTCTACC AGCTGTGCTT CATTCCTGTC TGAAGAAGAG CAGATGGCCT 1500 

GGCTGCTGCT GTGCAGTCCC TGCAGCTCTC TTTCCTCTGG GGCATTATCC ATCTTTCACT 1560 

ATCTGTAATG CCTTTTCTCA CCTGTCATCT CACATTTTCC CTTCCCTGAA GATCTAGTGA 1620 

ACATTCGACC TCCATTACGG AGAGTTTCCT ATGTTTCACT GTGCAAATAT ATCTGCTATT 1680 

CTCCATACTC TGTAACAGTT GCATTGACTG TCACATAATG CTCATACTTA TCTAATGTTG 1740 

AGTTATTAAT ATGTTATTAT TAAATAGAGA AATATGATTT GTGTATTATA ATTCAAAGGC 1800 

ATTTCTTTTC TGCATGTTCT AAATAAAAAG CATTATTATT TGCTGAAAAA AA 1852 
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(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 490 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Asp Pro Ala Val Ala Leu Val Leu Cys Leu Ser Cys Leu Phe Leu 
15 10 15 

Leu Ser Leu Trp Arg Gin Ser Ser Gly Arg Gly Arg Leu Pro Ser Gly 
20 25 30 

Pro Thr Pro Leu Pro lie He Gly Asn He Leu Gin Leu Asp Val Lys 
35 40 45 

Asp Met Ser Lys Ser Leu Thr Asn Phe Ser Lys Val Tyr Gly Pro Val 
50 55 60 

Phe Thr Val Tyr Phe Gly Leu Lys Pro He Val Val Leu His Gly Tyr 
65 70 75 80 

Glu Ala Val Lys Glu Ala Leu He Asp His Gly Glu Glu Phe Ser Gly 
85 90 95 

Arg Gly Ser Phe Pro val Ala Glu Lys Val Asn Lys Gly Leu Gly He 
100 105 110 

Leu Phe Ser Asn Gly Lys Arg Trp Lys Glu He Arg Arg Phe Cys Leu 
115 120 125 

Met Thr Leu Arg Asn Phe Gly Met Gly Lys Arg Ser He Glu Asp Arg 
130 135 140 

Val Gin Glu Glu Ala Arg Cys Leu Val Glu Glu Leu Arg Lys Thr Asn 
145 150 155 160 

Ala Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys Ala Pro Cys Asn 
165 170 175 

Val He Cys Ser Val He Phe His Asp Arg Phe Asp Tyr Lys Asp Gin 
180 185 190 

Arg Phe Leu Asn Leu Met Glu Lys Phe Asn Glu Asn Leu Arg He Leu 
195 200 205 

Ser Ser Pro Trp He Gin Val Cys Asn Asn Phe Pro Ala Leu He Asp 
210 215 220 

Tyr Leu Pro Gly Ser His Asn Lys He Ala Glu Asn Phe Ala Tyr He 
225 230 235 240 

Lys Ser Tyr Val Leu Glu Arg He Lys Glu His Gin Glu Ser Leu Asp 
245 250 255 

Met Asn Ser Ala Arg Asp Phe He Asp Cys Phe Leu He Lys Met Glu 
260 265 270 

Gin Glu Lys His Asn Gin Gin Ser Glu Phe Thr Val Glu Ser Leu He 
275 280 285 
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Ala Thr Val Thr Asp Met Phe Gly Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 

Thr Leu Arg Tyr Gly Leu Leu Leu Leu Leu Lys Tyr Pro Glu Val Thr 
305 310 315 320 

Ala Lys Val Gin Glu Glu He Glu Cys Val Val Gly Arg Asn Arg Ser 
325 330 335 

Pro Cys Met Gin Asp Arg Ser His Met Pro Tyr Thr Asp Ala Val Val 
340 345 350 

His Glu He Gin Arg Tyr He Asp Leu Leu Pro Thr Asn Leu Pro His 
355 360 365 

Ala Val Thr Cys Asp Val Lys Phe Lys Asn Tyr Leu He Pro Lys Gly 
370 375 380 

Met Thr He He Thr Ser Leu Thr Ser Val Leu His Asn Asp Lys Glu 
385 390 395 400 

Phe Pro Asn Pro Glu Met Phe Asp Pro Gly His Phe Leu Asp Lys Ser 
405 410 415 

Gly Asn Phe Lys Lys Ser Asp Tyr Phe Met Pro Phe Ser Ala Gly Lys 
420 425 430 

Arg Met Cys Met Gly Glu Gly Leu Ala Arg Met Glu Leu Phe Leu Phe 
435 440 445 

Leu Thr Thr He Leu Gin Asn Phe Asn Leu Lys Ser Gin Val Asp Pro 
450 455 460 

Lys Asp He Asp He Thr Pro He Ala Asn Ala Phe Gly Arg Val Pro 
465 470 475 480 

Pro Leu Tyr Gin Leu Cys Phe He Pro Val 
485 490 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 2258 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
CD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

AGTGAAAGCC CGCAGTTGTC TTACTAAGAA GAGAAGCCTT CAATGGATCC AGCTGTGGCT 60 

CTGGTGCTCT GTCTCTCCTG TTTGTTTCTC CTTTCACTCT GGAGGCAGAG CTCTGGAAGA 120 

GGGAGGCTCC CGTCTGGCCC CACTCCTCTC CCGATTATTG GAAATATCCT GCAGTTAGAT 180 

GTTAAGGACA TGAGCAAATC CTTAACCAAT TTCTCAAAAG TCTATGGCCC TGTGTTCACT 240 

GTGTATTTTG GCCTGAAGCC CATTGTGGTG TTGCATGGAT ATGAAGCAGT GAAGGAGGCC 300 

CTGATTGATC ATGGAGAGGA GTTTTCTGGA AGAGGAAGTT TTCCAGTGGC TGAAAAAGTT 360 

AACAAAGGAC TTGGAATCCT TTTCAGCAAT GGAAAGAGAT GGAAGGAGAT CCGGCGTTTC 420 

TGCCTCATGA CTCTGCGGAA TTTTGGGATG GGGAAGAGGA GCATCGAGGA CCGTGTTCAA 480 
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GAGGAAGCCC GCTGCCTTGT GGAGGAGTTG AGAAAAACCA ATGCCTCACC CTGTGATCCC 540 

ACTTTCATCC TGGGCTGTGC TCCCTGCAAT GTGATCTGCT CTGTTATTTT CCATGATCGA 600 

TTTGATTATA AAGATCAGAG GTTTCTTAAC TTGATGGAAA AATTCAATGA AAACCTCAGG 660 

ATTCTGAGCT CTCCATGGAT CCAGGTCTGC AATAATTTCC CTGCTCTCAT CGATTATCTC 720 

CCAGGAAGTC ATAATAAAAT AGCTGAAAAT TTTGCTTACA TTAAAAGTTA TGTATTGGAG 780 

AGAATAAAAG AACATCAAGA ATCCCTGGAC ATGAACAGTG CTCGGGACTT TATTGATTGT 840 

TTCCTGATCA AAATGGAACA GGAAAAGCAC AATCAACAGT CTGAATTTAC TGTTGAAAGC 900 

TTGATAGCCA CTGTAACTGA TATGTTTGGG GCTGGAACAG AGACAACGAG CACCACTCTG 960 

AGATATGGAC TCCTGCTCCT GCTGAAGTAC CCAGAGGTCA CAGCTAAAGT CCAGGAAGAG 1020 

ATTGAATGTG TAGTTGGCAG AAACCGGAGC CCCTGTATGC AGGACAGGAG TCACATGCCC 1080 

TACACAGATG CTGTGGTGCA CGAGATCCAG AGATACATTG ACCTCCTCCC CACCAACCTG 1140 

CCCCATGCAG TGACCTGTGA TGTTAAATTC AAAAACTACC TCATCCCCAA GGGCATGACC 1200 

ATAATAACAT CCCTGACTTC TGTGCTGCAC AATGACAAAG AATTCCCCAA CCCAGAGATG 1260 

TTTGACCCTG GCCACTTTCT GGATAAGAGT GGCAACTTTA AGAAAAGTGA CTACTTCATG 1320 

CCTTTCTCAG CAGGAAAACG GATGTGTATG GGAGAGGGCC TGGCCCGCAT GGAGCTGTTT 1380 

TTATTCCTGA CCACCATTTT GCAGAACTTT AACCTGAAAT CTCAGGTTGA CCCAAAGGAT 1440 

ATTGACATCA CCCCCATTGC CAATGCATTT GGTCGTGTGC CACCCTTGTA CCAGCTCTGC 1500 

TTCATTCCTG TCTGAAGAAG GGCAGATAGT TTGGCTGCTC CTGTGCTGTC ACCTGCAATT 1560 

CTCCCTTATC AGGGCCATTG GCCTCTCCCT TCTCTCTATG AGGGATATTT TCTCTGACTT 1620 

GTCAATCCAC ATCTTCCCAT TCCCTCAAGA TCCAATGAAC ATCCAACCTC CATTAAAGAG 1680 

AGTTTCTTGG GTCACTTCCT AAATATATCT GCTATTCTCC ATACTCTGTA TCACTTGTAT 1740 

TGACCACCAC ATATGCTAAT ACCTATCTAC TGCTGAGTTG TCAGTATGTT ATCACTATAA 1800 

AACAAAGAAA AATGATTAAT AAATGACAAT TCAGAGCCAT TTATTCTCTG CATGCTCTAG 1860 

ATAAAAATGA TTATTATTTA CTGGGTCAGT TCTTAGATTT CTTTCTTTTG AGTAAAATGA 1920 

AAGTAAGAAA TGAAAGAAAA TAGAATGTGA AGAGGCTGTG CTGGCCCTCA TAGTGTTAAG 1980 

CACAAAAAGG GAGAAAGGTA AGAGGGTAGG AAAGCTGTTT TAGCTAAATG CCACCTAGAG 2040 

TTATTGGAGG TCTGAATTTG GAAAAAAAAA CTATGTCCAG GAGCAGCTGT AACCTGTAGG 2100 

GAAATAATGG AACAATCATC CATAAGAGGG ATGAACATTA AGTGTTTGAA TTCATGCTCT 2160 

GCTTTTGTGT TACTGTAAAC ACAAGATCAA GATTTGGATA ATCTTTTTCC TTTGTGTTTC 2220 

CAACTTAGAT CATGTCTAAA TATATGCTTT CATATGGC 2258 
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(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 490 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(iii) HYPOTHETICAL: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Met Asp Pro Xaa Val Val Leu Val Leu Cys Leu Ser Cys Leu Leu Leu 
15 10 15 

Leu Ser Leu Trp Arg Gin Ser Ser Gly Arg Gly Lys Leu Pro Pro Gly 
20 25 30 

Pro Thr Pro Leu Pro Xaa He Gly Asn He Leu Gin He Asp Xaa Lys 
35 40 45 

Asp He Ser Lys Ser Leu Thr Asn Xaa Ser Lys Val Tyr Gly Pro Val 
50 55 60 

Phe Thr Xaa Tyr Phe Gly Leu Lys Pro He Val Val Leu His Gly Tyr 
65 70 75 80 

Glu Ala Val Lys Glu Ala Leu He Asp Leu Gly Glu Glu Phe Ser Gly 
85 90 95 

Arg Gly Xaa Phe Pro Leu Ala Glu Arg Ala Asn Xaa Gly Xaa Gly He 
100 105 HO 

Val Phe Ser Asn Gly Lys Arg Trp Lys Glu He Arg Arg Phe Ser Leu 
115 120 125 

Met Thr Leu Arg Asn Phe Gly Met Gly Lys Arg Ser He Glu Asp Arg 
130 135 140 

Val Gin Glu Glu Ala Arg Cys Leu Val Glu Glu Leu Arg Lys Thr Lys 
145 150 155 160 

Ala Ser Pro Cys Asp Pro Thr Phe He Leu Gly Cys Ala Pro Cys Asn 
165 170 175 

Val He Cys Ser Xaa He Phe His Lys Arg Phe Asp Tyr LyB Asp Gin 
180 185 190 

Gin Phe Leu Asn Leu Met Glu Lys Xaa Asn Glu Asn He Arg He Leu 
195 200 205 

Ser Ser Pro Trp He Gin Xaa Cys Asn Asn Phe Pro Xaa Xaa He Asp 
210 215 220 

Tvr Phe Pro Gly Thr His Asn Lys Leu Leu Lys Asn Val Ala Phe Met 
225 230 235 240 

Lvs Ser Tyr He Leu Glu Lys Val Lys Glu His Gin Glu Ser Xaa Asp 
* 245 250 255 

Met Asn Asn Pro Arg Asp Phe He Asp Cys Phe Leu He Lys Met Glu 
260 265 270 

Xaa Glu Lys His Asn Gin Gin Ser Glu Phe Thr He Glu Ser Leu Xaa 
275 280 285 
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Xaa Thr Xaa Xaa Asp Leu Phe Gly Ala Gly Thr Glu Thr Thr Ser Thr 
290 295 300 

Thr Leu Arg Tyr Xaa Leu Leu Leu Leu Leu Lys His Pro Glu Val Thr 
305 310 315 320 

Ala Lys Val Gin Glu Glu lie Glu Arg Val lie Gly Arg Asn Arg Ser 
325 330 335 

Pro Cys Met Gin Asp Arg Ser His Met Pro Tyr Thr Asp Ala Val Val 
340 345 350 

His Glu Xaa Gin Arg Tyr He Asp Leu Leu Pro Thr Ser Leu Pro His 
355 360 365 

Ala Val Thr Cys Asp Val Lys Phe Arg Asn Tyr Leu He Pro Lys Gly 
370 375 380 

Thr Thr He Leu Thr Ser Leu Thr Ser Val Leu His Asp Xaa Lys Glu 
385 390 395 400 

Phe Pro Asn Pro Glu Met Phe Asp Pro Gly His Phe Leu Asp Xaa Gly 
405 410 415 

Gly Asn Phe Lys Lys Ser Asp Tyr Phe Met Pro Phe Ser Ala Gly Lys 
420 425 430 

Arg He Cys Val Gly Glu Gly Leu Ala Arg Met Glu Leu Phe Leu Phe 
435 440 445 

Leu Thr Thr He Leu Gin Asn Phe Asn Leu Lys Ser Leu Val Asp Pro 
450 455 460 

Lys Xaa Leu Asp Thr Thr Pro Val Val Asn Gly Phe Ala Ser Val Pro 
465 470 475 480 

Pro Phe Tyr Gin Leu Cys Phe He Pro Val 
485 490 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1892 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: YES 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

AGTGAAAGCC CGCAGTTGTC TTACTAAGAA GAGAAGNCTT CAATGGATCC TNTTGTGGTC 60 

CTNGTGCTCT GTCTCTCATG TTTGCTTCTC CTTTCACTCT GGAGACAGAG CTCTGGGAGA 120 

GGNAANCTCC CTCCTGGCCC CACTCCTCTC CCANTNATTG GAAATATCCT ACAGATAGAT 180 

NTTAAGGACA TCAGCAAATC CTTAACCAAT NTCTCAAAAG TCTATGGCCC TGTGTTCACT 240 

NTGTATTTTG GCCTGAAACC CATAGTGGTG NTGCATGGAT ATGAAGCAGT GAAGGAAGCC 300 

CTGATTGATC NTGGAGAGGA GTTTTCTGGA AGAGGCANTT TCCCACTGGC TGAAAGAGNT 360 

AACANAGGAN TTGGAATCGT TTTCAGCAAT GGAAAGAGAT GGAAGGAGAT CCGGCGTTTC 420 
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T CC CTCATGA 


Lut 1 wC w win 


TTTTGGGATG GGGAAGAGGA GCATTGAGGA 


CCGTGTTCAA 


480 


GAGGAAGULu 


GClouv»i xvjx 


GGAGGAGTTG AGAAAAACCA AGGCCTCACC 


CTGTGATCCC 




AuTTTCA 1 LL 


lubuUlulul. 


TCCCTGCAAT GTGATCTGCT CCNTTATTTT 


CCATAAACGN 


fioo 


11 IViAi iAxA 


AAf2ATPAnNA 
An wa 1 Uiwun 


ATTTCTTAAC TTGATGGAAA AATTNAATGA AAACATCAGG 


660 

WWW 


A11L lunuUV 


PPPPKTTGGAT 


CCAGNTCTGC AATAATTTNC CTCCTNTCAT 


TGATTATTTC 


720 


ppnt nf2 a a ptp 


AMAAPAAATT 


ACTTAAAAAN GTTGCTTTTA TGAAAAGTTA 


TATTTTGGAG 


780 


a a a(tta A AAG 


AAPAPPAAGA 
AnUiwWviun 


ATCANTGGAC ATGAACAANC CTCGGGACTT 


TATTGATTGC 


840 


1 1 ULlUnl V_n 


A A ATGG AftNA 
Ann X UurlVJlin 


GGAAAAGCAC AACCAACAGT CTGAATTTAC 


TATTGAAAGC 


7 w w 


1 X w"w X>ilNlN Un 




nttgtttgga gctggnacag agacaacaag 


CACNACNCTG 


960 


a ft a ti tgwn p 


TP fTM PTP P'l* 


GCTGAAGCAC CCAGAGGTCA CAGCTAAAGT 


CCAGGAAGAG 


1020 


ATTG A A P GTR 

AX XwAAWwXw 


t aattgg pag 


AAACCGGAGC CCCTGCATGC AGGACAGGAG 


CCACATGCCC 


1080 


TAPAPAfiATfl 


PTGTGRTGPJX 


CGAGNTCCAG AGATACATTG ACCTNCTCCC 


CACCAGCCTG 


1140 


ppppatyspa^ 


X unLU X w X Van 


NNTTAAATTC AGAAACTACC TCATNCCCAA 


GGGCACAACC 


1 POO 


A IAN X AAUCSI 1 


LLL x UAL llL 


TGTGCTACAT GANNACAAAG AATTTCCCAA 


CCCAGAGATG 


1 9fifi 


*^*P^P^* 7% ^^^^ PfKT 


uWUAul 11L1 


GGATNANNGT GGCAANTTTA AGAAAAGTNA 


CTACTTCATG 




CCTTTCTCAG 


p t\ pp a tv pp 


GATTTGTGTG GGAGANGGCC TGGCCCGCAT 


GGAGCTGTTT 


lJOU 


1 inl ILLlun 


LUIiLiUni XXX 


ACAGAACTTT AACCTGAAAT CTCTGGTTGA 


CCCAAANGAC 




< "I'l'GAPA P PA 


PTPPJXGTTGN 

w X W^Aw X X VjIi 


CAATGGATTT GCTTCTGTGC CNCCCTTCTA 


CCAGCTNTGC 


1500 




TCTGAAGAAG 


GGCAGATGGT CTGGCTGCTN CTGTGCTGTC NCNNNNNNTN 


1560 


NOTTTNNTCT 


GGGGCAATTT 


CCNTCTTNCA TNNNTNTTNN TGCNNTTTNT 


CATCTGNCAT 


1620 


CTCACANTNC 


NNCTTCCCTT 


ANCATCNAGN NACCATTNAN NNNCAATNTC 


CAAGAGNGTG 


1680 


NNTTTNTTNN 


CTNTCCACCT 


ANATCTATCN NTNNNNCTNC TNTNTNTNNA 


TNACTTTGAT 


1740 


TGTCCNCTAN 


TGATGNTAAT 


TNTTTAATAT TGNNTTATTG NNANNNTNTT ATNANTNANA 


1800 


AANAAATGAT 


AATTNTNTNN 


AAATNNNAAG TCANTGCNNT TNANNATNTN 


CNNAATAAAA 


I860 


AGCATTATTA 


TTTGCTGAAA 


AAAAGTCAGT TC 




1892 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

. (ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
GCAAGCTTAA AAAATGGATC CAGCTGTGGC TCT 



33 
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(2) INFORMATION FOR SEQ ID NO: 16: 

(i] SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
GCAAGCTTGC CAAACTATCT GCCCTTCT 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
ACTTTTCAAT GTAAGCAAAT 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
TTAGTAATTC TTTGAGATAT 



(2) INFORMATION FOR SEQ ID NO: 19: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
CTGTTAGCTC TTTCAGCCAG 
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(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



txi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
GGAGCACAGC CCAGGATGAA 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 
GCAAGCTTAA AAAATGGATC CAGCTGTGGC TCT 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
GCAAGCTTGC CAAACTATCT GCCCTTCT 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
TGGCCCTGAT AAGGGAGAAT 
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(2) INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 
ATCCAGAGAT ACATTGACCT C 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 
CCATGAAGTG ACCTGTGATG 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 
AAAGATGGAT AATGCCCCAG 



(2) INFORMATION TOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
GAAGGAGATC CGGCGTTTCT 
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(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

[ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
GGCGTTTCTC CCTCATGACG 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
TTGTCATTGT GCAG 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
CACATGCCCT ACACA 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
TGACGCTGCG GAATT 
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(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
GGACTTTATT GATTG 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
ATGATTCTCT TGTGGTCCT 



(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
AAAGATGGAT AATGCCCCCA G 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:35: 
GCAAGCTTAA AAAAATGGAA CCTTTTGTGG TCCT 
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(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH : 28 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA {primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
GCAAGCTTGC CAGATGGGCT AGCATTCT 28 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
GCAAGCTTAA AAAAATGGAT TCTCTTGTGG TCCT 
(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
GCAAGCTTGC CAGGCCATCT GCTCTTCT 



(2) INFORMATION FOR SEQ ID NO:39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: 
GCAAGCTTAA AAAAATGGAT TCTCTTGTGG TCCT 
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(2) INFORMATION FOR SEQ ID NO:40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



{xi) SEQUENCE DESCRIPTION: SEQ ID N0:40: 
GCAAGCTTGC CAGACCATCT GTGCTTCT 



(2) INFORMATION FOR SEQ ID N0:41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligo) 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
AGCTTAAAAA AATG 



(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligo) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
GATCCATTTT TTTA 



(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 

Cys He Asp Tyr Leu Pro Gly Ser His Asn Lys He Ala Glu Asn Phe 
15 10 15 

Ala 



(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 

Cys Leu Ala Phe Met Glu Ser Asp He Leu Glu Lys Val Lys 
15 10 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 284 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 2.. 283 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

A TTG AAT GAA AAC ATC AGG ATT GTA AGC ACC CCC TGG ATC CAG ATA 46 
Leu Asn Glu Asn He Arg He Val Ser Thr Pro Trp He Gin He 
1 5 10 15 

TGC AAT AAT TTT CCC ACT ATC ATT GAT TAT TTC CCG GGA ACC CAT AAC 94 
Cys Asn Asn Phe Pro Thr He He Asp Tyr Phe Pro Gly Thr His Asn 
20 25 30 

AAA TTA CTT AAA AAC CTT GCT TTT ATG GAA AGT GAT ATT TTG GAG AAA 142 
Lys Leu Leu Lys Asn Leu Ala Phe Met Glu Ser Asp He Leu Glu Lys 
35 40 45 

GTA AAA GAA CAC CAA GAA TCG ATG GAC ATC AAC AAC CCT CGG GAC TTT 19 0 

Val Lys Glu His Gin Glu Ser Met Asp He Asn Asn Pro Arg Asp Phe 
50 55 60 

ATT GAT TGC TTC CTG ATC AAA ATG GAG AAG GAA AAG CAA AAC CAA CAG 236 
He Asp Cys Phe Leu He Lys Met Glu Lys Glu Lys Gin Asn Gin Gin 
65 70 75 

TCT GAA TTC ACT ATT GAA AAC TTG GTA ATC ACT GCA GCT GAC TTA 283 
Ser Glu Phe Thr He Glu Asn Leu Val He Thr Ala Ala Asp Leu 
80 85 90 

C 284 
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(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 94 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(iij MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 

Leu Asn Glu Asn lie Arg He Val Ser Thr Pro Trp He Gin He Cys 
15 10 15 

Asn Asn Phe Pro Thr He He Asp Tyr Phe Pro Gly Thr His Asn Lys 
20 25 30 

Leu Leu Lys Asn Leu Ala Phe Met Glu Ser Asp He Leu Glu Lys Val 
35 40 45 

Lys Glu His Gin Glu Ser Met Asp He Asn Asn Pro Arg Asp Phe He 
50 55 60 

Asp Cys Phe Leu He Lys Met Glu Lys Glu Lys Gin Asn Gin Gin Ser 
65 70 75 80 

Glu Phe Thr He Glu Asn Leu Val He Thr Ala Ala Asp Leu 
85 90 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 244 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 44.. 103 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 

ATTGAATGAA AACATCAGGA TTGTAAGCAC CCCCTGGATC CAG GAA CCC ATA ACA 55 

Glu Pro He Thr 
1 

AAT TAC TTA AAA ACC TTG CTT TTA TGG AAA GTG ATA TTT TGG AGA AAG 103 
Asn Tyr Leu Lys Thr Leu Leu Leu Trp Lys Val He Phe Trp Arg Lys 
5 10 15 20 

TAAAAGAACA CCAAGAATCG ATGGACATCA ACAACCCTCG GGACTTTATT GATTGCTTCC 163 

TGATCAAAAT GGAGAAGGAA AAGCAAAACC AACAGTCTGA ATTCACTATT GAAAACTTGG 223 

TAATCACTGC AGCTGACTTA C 244 
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(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: 

Glu Pro lie Thr Asn Tyr Leu Lys Thr Leu Leu Leu Trp Lys Val lie 
1 5 10 15 

Phe Trp Arg Lys 
20 



(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 83 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix> FEATURE: 

(A) NAME/KEY: intron 

(B) LOCATION: 1..32 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 33.-83 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
TTTTAATTTA ATAAATTATT GTTTTCTCTT AGATATGCAA TAATTTTCCC ACTATCATTG 
ATTATTTCCC GGGAACCCAT AAC 



(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 83 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME /KEY: intron 

(B) LOCATION: 1.-72 

(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 73.. 83 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:50: 
TTTTAATTTA ATAAATTATT GTTTTCTCTT. AGATATGCAA TAATTTTCCC ACTATCATTG 
ATTATTTCCA AGGAACCCAT AAC 
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(2) INFORMATION FOR SEQ ID NO; 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 826 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOIiOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 



ATGGTGATGT 


AGNAANTCAT 


NCCATCTTAT 


ATTTCNAGAG TGTAGAGGAG GATTGTTGNG 


60 


GAAGTAAGAG 


GNNTAAGATA 


GAGATGCNTT 


TATACTATCC CAAGCAGGGA TRAGTCTAGG 


120 


AAATGATTAT 


CGTCTTTGAT 


TCTCTTGTCA 


GRATTTTCTT TCTCMNATCT TGTATAATCA 


180 


GAGAATTACT 


ACACATGGAC 


AATRAARATT 


TCCCCNTCCA GATANACAAT ATATTTTATT 


240 


TATATTTATA 


GTTTTAAATT 


ACAACCAGAG 


CTTGGCATAT TGTATCTATA CCTTTAATAA 


300 


ATGCTTTTAA 


TTTAATAAAT 


TATTGTTTTC 


TCTTAGATAT GCAATAATTT TCCCACTATC 


360 


ATTGATTATT 


TCCCGGGAAC 


CCATAACAAA 


TTACTTAAAA ACCTTGCTTT TATGGAAAGT 


420 


GATATTTTGG 


AGAAAGTAAA 


AGAACACCAA 


GAATCGATGG ACATCAACAA CCCTCGGGAC 


460 


TTTATTGATT 


GCTTCCTGAT 


CAAAATGGAG 


AAGGTAAAAT GTTAACAAAA GCTTAGTTAT 


540 


GTGACTGCTT 


GCGTATKTGT 


GATTCATTGA 


CTAGTTGKGT GTTTACTACG GATGTTTAAC 


600 


AGGTCAAGGA 


GTAATGCTTG 


AGAAGCATAT 


TTAAGTTTTT ATTGTATGCA TGAATATCCA 


660 


GTAAGCATCA 


TAGAAAATGT 


AAAATTAANT 


TGTTAAATAA TTAGAATACA TAGAAGAAAT 


720 


TGTTTAGATA 


AATATNATCT 


ATCTGAACAA 


TAAGGATGTC AGGATAGGAA AAGCTCTGTT 


780 


TCTGCAGCTT 


CCAGTGGAGA 


TCAGCACAGG 


AGGGAACTTA TTTTTT 


826 



(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 655 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 2 63.. 421 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

AGGGAAAAGA CAAATAGGCC GGGGATGNAA ATTTAGCATG TGAGCAACCT TANTTAACCA 60 

GCTAGGCTGT AATTGNTAAT TCGAGANTAA TGTNAAAGTG ATGTGTTGAT TTTATGCATG 120 

CCNNACTCNT TTTTGCTTTT AAGGGGAGTC ATAGGTAAGA TATTACTTAA AATTTCTAAA 180 
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CTATTATTAT CTGTTAACTA ATATGAAGTG TTTTATATCT AATGTTTACT CATATTTTAA 240 

AATTGTTTCC AATCATTTAG CT TCA CCC TGT GAT CCC ACT TTC ATC CTG GGC 292 

Ser Pro Cys Asp Pro Thr Phe lie Leu Gly 
15 10 

TGT GCT CCC TGC AAT GTG ATC TGC TCC ATT ATT TTC CAG AAA CGT TTC 340 
Cys Ala Pro Cys Asn Val lie Cys Ser lie lie Phe Gin Lys Arg Phe 
15 20 25 

GAT TAT AAA GAT CAG CAA TTT CTT AAC TTG ATG GAA AAA TTG AAT GAA 388 
Asp Tyr Lys Asp Gin Gin Phe Leu Asn Leu Met Glu Lys Leu Asn Glu 
30 35 40 

AAC ATC AGG ATT GTA AGC ACC CCC TGG ATC CAG GTAAGGACA AGTTTTGTGC 440 
Asn lie Arg lie Val Ser Thr Pro Trp lie Gin . 
45 50 

TTCCTGAGAA ACCACTTACA GTCTTTTTTT CTGGGAAATC CAAAATTCTA TATTGACCAA 500 

GCCCTGAAGT ACATTTGTGA ATACTACAGT CTTGCCTAGA CAGCCATGGG GTGAATATCT 560 
GGAAAAGATG GCAAAGNTCT TTATTTTATG CACAGGAAAT GAATATCCCA ATATAGATCA 620 
GGCTTCTAAG CCCATTAGCT CCCTGATCAG TGTTT 655 

(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 53 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 

Ser Pro Cys Asp Pro Thr Phe lie Leu Gly Cys Ala Pro Cys Asn Val 
15 10 15 

lie Cys Ser He He Phe Gin Lys Arg Phe Asp Tyr Lys Asp Gin Gin 
20 25 30 

Phe Leu Asn Leu Met Glu Lys Leu Asn Glu Asn He Arg He Val Ser 
35 40 45 

Thr Pro Trp He Gin 
50 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 292 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
ATGAAGTGTT TTATATCTAA TGTTTACTCA TATTTTAAAA TTGTTTCCAA TCATTTAGCT 60 
TCACCCTGTG ATCCCACTTT CATCCTGGGC TGTGCTCCCT GCAATGTGAT CTGCTCCATT 120 
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ATTTTCCAGA AACGTTTCGA TTATAAAGAT CAGCAATTTC TTAACTTGAT GGAAAAATTG 180 

AATGAAAACA TCAGGATTGT AAGCACCCCC TGAATCCAGG TAAGGACAAG TTTTGTGCTT 240 

CCTGAGAAAC CACTTACAGT CTTTTTTTCT GGGAAATCCA AAATTCTATA TT 292 



(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
AATTACAACC AGAGCTTGGC 20 



(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
TATCACTTTC CATAAAAGCA AG 22 



(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
TATTATCTGT TAACTAACTA ATATGA 26 



(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (primer) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 
ACTTCAGGGC TTGGTCAATA 20 

(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:59: 
ATTGAATGAA AACATCAGGA TTG 23 

(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
GTAAGTCAGC TGCAGTGATT A 21 

(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 826 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

ATGGTGATGT AGNAANTCAT NCCATCTTAT ATTTCNAGAG TGTAGAGGAG GATTGTTGNG 60 

GAAGTAAGAG GNNTAAGATA GAGATGCNTT TATACTATCC CAAGCAGGGA TRAGTCTAGG 120 

AAATGATTAT CGTCTTTGAT TCTCTTGTCA GRATTTTCTT TCTCMNATCT TGTATAATCA 180 

GAGAATTACT ACACATGGAC AATRAARATT TCCCCNTCCA GATANACAAT ATATTTTATT 240 

TATATTTATA GTTTTAAATT ACAACCAGAG CTTGGCATAT TGTATCTATA CCTTTAATAA 300 

ATGCTTTTAA TTTAATAAAT TATTGTTTTC TCTTAGATAT GCAATAATTT TCCCACTATC 360 

ATTGATTATT TCCCAGGAAC CCATAACAAA TTACTTAAAA ACCTTGCTTT TATGGAAAGT 420 

GATATTTTGG AGAAAGTAAA AGAACACCAA GAATCGATGG ACATCAACAA CCCTCGGGAC 480 
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TTTATTGATT GCTTCCTGAT CAAAATGGAG AAGGTAAAAT GTTAACAAAA GCTTAGTTAT 54 0 

GTGACTGCTT GCGTATKTGT GATTCATTGA CTAGTTGKGT GTTTACTACG GATGTTTAAC 600 

AGGTCAAGGA GTAATGCTTG AGAAGCATAT TTAAGTTCTT ATTGTATGCA TGAATATCCA 660 

GTAAGCATCA TAGAAAATGT AAAATTAANT TGTTAAATAA TTAGAATACA TAGAAGAAAT 720 

TGTTTAGATA AATATNATCT ATCTGAACAA TAAGGATGTC AGGATAGGAA AAGCTCTGTT 780 

TCTGCAGCTT CCAGTGGAGA TCAGCACAGG AGGGAACTTA TITITI ' 826 
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WHAT IS CLAIMED IS; 

1 1. A purified cytochrome P450 2C19 polypeptide 

2 comprising an amino acid sequence having at least 97% sequence 

3 identity with the amino acid sequence designated SEQ. ID. 

4 No. 1. 

1 2. A purified DNA segment encoding the purified 

2 polypeptide of claim 1. 

1 3. A stable cell line comprising an exogenous DNA 

2 segment encoding a cytochrome P450 2C19 polypeptide of 

3 claim 1, the DNA segment capable of being expressed in the 

4 cell line. 

1 4. A method of screening for a drug that is 

2 metabolized by S-mephenytoin 4 1 hydroxylase activity, the 

3 method comprising the steps of: 

4 contacting the drug with a cytochrome P450 2C19 

5 polypeptide of claim 1; and 

6 detecting a metabolic product resulting from an 

7 interaction between the drug and the polypeptide, the presence 

8 of the product indicating the drug is metabolized by the S- 

9 mephenytoin 4 • -hydroxylase activity. 

1 5. A method of diagnosing a patient having a 

2 deficiency in S-mephenytoin 4 1 -hydroxylase activity, the 

3 method comprising: 

4 obtaining a sample of nucleic acids from the 

5 patient; and 

6 analyzing a cytochrome P450 2C19 DNA sequence 

7 from the nucleic acids in the sample for the presence of a 

8 polymorphism indicative of the deficiency, 

1 6. The method of claim 5, further comprising the 

2 step of amplifying the cytochrome P450 2C19 DNA sequence. 



1 

2 



7. The method of claim 6, wherein the P450 2C19 
DNA sequence is genomic. 
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1 8. The method of claim 7, wherein the amplifying 

2 step is primed from a forward primer sufficiently 

3 complementary with a first subsequence of the antisense strand 

4 of the 2C19 sequence to hybridize therewith, and a reverse 

5 primer sufficiently complementary to a second subsequence of 

6 the sense strand of the 2C19 sequence to hybridize therewith. 

1 9. The method of claim 8, wherein the polymorphism 

2 occurs at nucleotide 681 of the coding region of the P450 2C19 

3 DNA genomic sequence. 

1 10. The method of claim 9, wherein the first 

2 subsequence of the sense strand is upstream from nucleotide 

3 681 of the coding region, and the second subsequence of the 

4 antisense strand is downstream of nucleotide 681 of the coding 

5 region. 

1 11. The method of claim 10, wherein the analyzing 

2 step comprises digesting the amplified DNA segment with a 

3 restriction enzyme that recognizes a site including nucleotide 

4 681 of the coding region. 

1 12. The method of claim 8, wherein the polymorphism 

2 occurs at nucleotide 636 of the coding region of the P450 2C19 

3 DNA genomic sequence. 

1 13. The method of claim 12, wherein the first 

2 subsequence of the sense strand is upstream from nucleotide 

3 636 of the coding region, and the second subsequence of the 

4 antisense strand is downstream of nucleotide 636 of the coding 

5 region. 

1 14. The method of claim 13, wherein the analyzing 

2 step comprises digesting the amplified DNA segment with a 

3 restriction enzyme that recognizes a site including nucleotide 

4 636 of the coding region. 
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1 15. The method of claim 8, wherein the polymorphism 

2 occurs at nucleotide 636 or 681 of the coding region of the 

3 P450 2C19 DNA genomic sequence, wherein the first subsequence 

4 of the sense strand is upstream from nucleotide 636 of the 

5 coding region, and the second subsequence of the antisense 

6 strand is downstream of nucleotide 681 of the coding region. 

1 16. The method of claim 9, wherein the forward 

2 primer has 

3 about 10-50 contiguous nucleotides from the 

4 wildtype 2C19 sequence (SEQ. ID. No. 51) shown in Fig. 16 

5 including the nucleotide at position 681 of the coding region; 

6 wherein the forward primer primes amplification 

7 from the complement of the wildtype 2C19 sequence without 

8 priming amplification from the complement of the mutant 2C19 

9 sequence shown in Fig. 16 (SEQ. ID. No. 61). 

1 17. The method of claim 16, wherein the 3« 

2 nucleotide of the forward primer is the nucleotide at position 

3 681. 

1 18. The method of claim 9, wherein the reverse 

2 primer has 

3 about 10-50 contiguous nucleotides from the 

4 complement of the wildtype 2C19 sequence (SEQ. ID. No. 51) 

5 shown in Fig. 16 including the complement to nucleotide 681 of 

6 the coding region; 

7 wherein the reverse primer primes amplification 

8 from the wildtype 2C19 sequence without priming amplification 

9 from the mutant 2C19 sequence (SEQ. ID. No. 61) shown in 
10 Fig. 16. 

1 19 . The method of claim 18 , wherein the 3 • 

2 nucleotide of the reverse primer is the complement of the 

3 nucleotide at position 681. 
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1 20. The method of claim 9, wherein the forward 

2 primer has 

3 about 10-50 contiguous nucleotides from the 

4 mutant 2C19 sequence shown in Fig. 16 including the nucleotide 

5 at position 681 of the coding sequence, 

6 wherein the forward primer primes amplification 

7 from the complement of the mutant 2C19 sequence (SEQ. ID. 

8 No. 61) without priming amplification from the complement of 

9 the wildtype 2C19 (SEQ. ID. No. 51) sequence shown in Fig. 16. 

1 21. The method of claim 20, wherein the 3 • 

2 nucleotide of the forward primer is the nucleotide at 

3 position 681. 

1 22. The method of claim 9, wherein the reverse 

2 primer has 

3 about 10-50 contiguous nucleotides from the 

4 complement of the mutant 2C19 sequence (SEQ. ID. No. 61) shown 

5 in Fig. 16 including the complement to nucleotide 681 of the 

6 coding region; 

7 wherein the reverse primer primes amplification 

8 from the mutant 2C19 sequence without priming amplification 

9 from the wildtype 2C19 (SEQ. ID. No. 51) sequence shown in 
10 Fig. 16. 

1 23. The method of claim 22, wherein the 3' 

2 nucleotide of the reverse primer is the complement of the 

3 nucleotide at position 681. 

1 24. The method of claim 12, wherein the forward 

2 primer has 

3 about 10-50 contiguous nucleotides from the 

4 wildtype 2C19 sequence (SEQ. ID. No. 52) shown in Fig. 17 

• 5 including the nucleotide at position 636 of the coding region; 

6 wherein the forward primer primes amplification 

7 from the complement of the wildtype 2C19 sequence (SEQ. ID. 

8 No. 54) without priming amplification from the complement of 

9 the mutant 2C19 sequence shown in Fig. 17. 
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1 25. The method of claim 12, wherein the reverse 

2 primer has 

3 about 10-50 contiguous nucleotides from the 

4 complement of the wildtype 2C19 sequence (SEQ. ID, No. 52) 

5 shown in Pig. 17 including the complement to nucleotide 636 of 

6 the coding region; 

7 wherein the reverse primer primes amplification 

8 from the wildtype 2C19 sequence without priming amplification 

9 from the mutant 2C19 sequence (SEQ. ID. No. 54) shown in 
10 Fig. 17. 

1 26. The method of claim 12, wherein the forward 

2 primer has 

3 about 10-50 contiguous nucleotides from the 

4 mutant 2C19 sequence (SEQ. ID. No. 54) shown in Fig. 17 

5 including the nucleotide at position 636 of the coding 

6 sequence, 

7 wherein the forward primer primes amplification 

8 from the complement of the mutant 2C19 sequence without 

9 priming amplification from the complement of the wildtype 2C19 
10 sequence (SEQ. ID. No. 52) shown in Fig 17. 

1 27. The method of claim 12, wherein the reverse 

2 primer has 

3 about 10-50 contiguous nucleotides from the 

4 complement of the mutant 2C19 sequence (SEQ. ID. No. 54) shown 

5 in Fig. 17 including the complement to nucleotide 636 of the 

6 coding region; 

7 wherein the reverse primer primes amplification 

8 from the mutant 2C19 sequence without priming amplification 

9 from the wildtype 2C19 sequence (SEQ. ID. No. 52) shown in 
10 Fig. 17. 

1 28. The method of claim 6, wherein the segment of 

2 the 2C19 sequence to be amplified is a cDNA sequence, and the 

3 method further comprises the step of reverse transcribing mRNA 

4 in the sample to produce the cDNA sequence. 



WO 95/30766 



PCT/DS95/05744 



131 

1 29. The method of claim 28, wherein the forward 

2 primer comprises about 10-50 contiguous nucleotides upstream 

3 of nucleotide 643 of the coding region of the wildtype 2C19 

4 cDNA sequence (SEQ. ID* No. 49) shown in Fig. 12 and 

5 hybridizes to the complement of the 2C19 sequence upstream 

6 from nucleotide 643 of the coding region, and the reverse 

7 primer comprises about 10-50 contiguous nucleotides from the 

8 complement of the wildtype 2C19 cDNA sequence (SEQ. ID No. 49) 

9 shown in Fig. 12 and hybridizes to the 2C19 sequence 
10 downstream from nucleotide 682 of the coding region. 

1 30. The method of claim 28, wherein the forward 

2 primer hybridizes to the complement of the wildtype 2C19 cDNA 

3 sequence (SEQ. ID. No. 49) shown in Fig. 12 between 

4 nucleotides 643 and 682 without hybridizing to the complement 

5 of the mutant 2C19 cDNA sequence (SEQ. ID. No. 50) shown in 

6 Fig. 12. 

1 31. The method of claim 30, wherein the reverse 

2 primer hybridizes to the wildtype 2C19 cDNA sequence (SEQ. ID. 

3 No. 49) shown in Fig. 12 between nucleotides 643 and 682 

4 without hybridizing to the mutant 2C19 cDNA sequence (SEQ. ID. 

5 No. 50) shown in Fig. 12. 

1 32. The method of claim 28, wherein the forward 

2 primer comprises about 10-50 contiguous nucleotides upstream 

3 of nucleotide 636 of the coding region of the wildtype 2C19 

4 cDNA sequence (SEQ. ID. No. 49) shown in Fig. 12, and the 

5 reverse primer comprises about 10-50 contiguous nucleotides 

6 from the complement of the wildtype 2C19 cDNA sequence (SEQ. 

7 ID. No. 49) shown in Fig. 12 downstream from nucleotide 636 of 

8 the coding region. 

1 33. The method of claim 28, wherein the full-length 

2 2C19 cDNA sequence is amplified. 



1 34. The method of claim 33, further comprising the 

2 step of sequencing a segment of the 2C19 cDNA sequence. 
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1 35. The method of claim 5 further comprising the 

2 step of: 

3 digesting the DNA with a restriction enzyme 

4 that recognizes a site including nucleotide 636 or 681 of the 

5 2C19 DNA sequence; 

6 wherein: 

7 the 2C19 DNA sequence is genomic; and 

8 the analyzing step comprises detecting the 

9 products resulting from the digestion by Southern blotting 

1 with a labelled segment of the 2C19 DNA sequence as a probe. 

1 36. A diagnostic kit comprising: 

2 a forward primer sufficiently complementary 

3 with a first subsequence of the antisense strand of a double- 

4 stranded 2C19 genomic DNA sequence to hybridize therewith, and 

5 a reverse primer sufficiently complementary with a second 

6 subsequence of the sense strand of the 2C19 genomic sequence 

7 to hybridize therewith; 

8 wherein the first subsequence is upstream of 

9 nucleotide 681 of the coding region, and second subsequence is 
10 downstream of nucleotide 681 of the coding region. 

1 37. The diagnostic kit of claim 36, wherein the 

2 first subsequence is upstream from nucleotide 636 of the 

3 coding region. 

1 38. The diagnostic kit of claim 36, wherein the 

2 forward primer has about 10-50 contiguous nucleotides from the 

3 wildtype 2C19 sequence (SEQ. ID. No. 51) shown in Fig. 16, and 

4 the reverse primer has about 10-50 contiguous nucleotides from 

5 the complement of the wildtype 2C19 sequence (SEQ. ID. No. 51) 

6 shown in Fig. 16. 
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1 39. The diagnostic kit of claim 38, further 

2 comprising 

3 a second forward primer sufficiently 

4 complementary with a first subsequence of the antisense strand 

5 of a double-stranded 2C19 genomic DNA sequence to hybridize 

6 therewith, and a a second reverse primer sufficiently 

7 complementary with a second subsequence of the sense strand of 

8 the 2C19 genomic sequence to hybridize therewith; 

9 wherein the first subsequence is upstream of 

10 nucleotide 636 of the coding region, and second subsequence is 

11 downstream of nucleotide €36 of the coding region. 

1 40. The diagnostic kit of claim 39, further 

2 comprising a restriction enzyme that recognizes a site that 

3 includes nucleotide 681 or nucleotide 636 of the coding 

4 region. 

.1 41. A primer selected from the group consisting of: 

2 (a) a first forward primer having: 

3 about 10-50 contiguous nucleotides from 

4 the wildtype 2C19 sequence (SEQ. ID. No. 51) shown in Fig. 16 

5 including the nucleotide at position 681 of the coding region; 

6 wherein the first forward primer primes 

7 amplification from the complement of the wildtype 2C19 

8 sequence without priming amplification from the complement of 

9 the mutant 2C19 sequence (SEQ. ID. No. 61) shown in Fig. 16; 

10 (b) a first reverse primer having: 

11 about 10-50 contiguous nucleotides from 

12 the complement of the wildtype 2C19 sequence (SEQ. ID. No. 51) 

13 shown in Fig. 16 including the complement to nucleotide 681 of 

14 the coding region; 

- I 5 wherein the first reverse primer primes 

16 amplification from the wildtype 2C19 sequence without priming 

♦ 17 amplification from the mutant 2C19 sequence shown in Fig. 16* 
18 , (c) a second forward primer having: 

19 about 10-50 contiguous nucleotides from 

20 the mutant 2C19 sequence (SEQ. ID. No. 61) shown in Fig. 16 

21 including the nucleotide at position 681 of the coding sequence, 
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22 wherein the second forward primer primes 

23 amplification from the complement of the mutant 2C19 sequence 

24 without priming amplification from the complement of the 

25 wildtype 2C19 sequence (SEQ. ID. No. 51) shown in Fig 16; and 

26 (d) a second reverse primer having: 

27 about 10-50 contiguous nucleotides from 

28 the complement of the mutant 2C19 sequence (SEQ. ID. No. 61) 

29 shown in Fig. 16 including the complement to nucleotide 681 of 

30 the coding region; 

31 wherein the second reverse primer primes 

32 amplification from the mutant 2C19 sequence without priming 

33 amplification from the wildtype 2C19 sequence (SEQ. ID. 

34 No. 51) shown in Fig. 16 

35 (e) a third forward primer having: 

36 about 10-50 contiguous nucleotides from 

37 the wildtype 2C19 sequence (SEQ. ID. No. 52) shown in Fig, 17 

38 including the nucleotide at position 636 of the coding region; 

39 wherein the first forward primer primes 

40 amplification from the complement of the wildtype 2C19 

41 sequence without priming amplification from the complement of 

42 the mutant 2C19 sequence (SEQ. ID. No. 54) shown in Fig. 17; 

43 (f) a third reverse primer having: 

44 about 10-50 contiguous nucleotides from 

45 the complement of the wildtype 2C19 sequence (SEQ. ID. No. 52) 

46 shown in Fig. 17 including the complement to nucleotide 636 of 

47 the coding region; 

48 wherein the first reverse primer primes 

49 amplification from the wildtype 2C19 sequence without priming 

50 amplification from the mutant 2C19 sequence (SEQ. ID. No. 54) 

51 shown in Fig. 17; 

52 (g) a fourth forward primer having: 

53 about 10-50 contiguous nucleotides from 

54 the mutant 2C19 sequence (SEQ. ID. No. 54) shown in Fig. 17 

55 including the nucleotide at position 636 of the coding 

56 sequence, 

57 wherein the second forward primer primes 

58 amplification from the complement of the mutant 2C19 sequence 
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59 without priming amplification from the complement of the 

60 wildtype 2C19 sequence (SEQ. ID. No. 52) shown in Fig 17; and 

61 (h) a fourth reverse primer having: 

62 about 10-50 contiguous nucleotides from 

63 the complement of the mutant 2C19 sequence (SEQ. ID. No. 54) 

64 shown in Fig. 17 including the complement to nucleotide 681 of 

65 the coding region; 

66 wherein the fourth reverse primer primes 

67 amplification from the mutant 2C19 sequence without priming 

68 amplification from the wildtype 2C19 sequence (SEQ. ID. 

69 No. 52) shown in Fig. 17. 
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25 GA GAAGGCTTCA 

65 GAAGGCTTCA 

2 9c AGCCTTGAAG TGAAAGCCCG CAGTTGTCTT ACTAAGAAGA GAAGCCTTCA 
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11a CTTCA 
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51 

2c TTCAcTCTGG 
2c8 TTCACTCTGG 

25 TTCACTCTGG 

55 TTCACTCTGG 
2 9c TTCACTCTGG 

5b TTCACTCTGG 
11a TTCAATCTGG 



AGaCAGAGCT 
AGACAGAGCT 
AGACAGAGCT 
AGACAGAGCT 
AGGCAGAGCT 
AGGCAGAGCT 
AGACAGAGCT 



3/30 

cTgGgAGAgG 
GTAGGAGAAG 
CTGGGAGAGG 
CTGGGAGAGG 
CTGGAAGAGG 
CTGGAAGAGG 
CTGGGAGAGG 



.Aa.CTCCCt 
GAAGCTCCCT 
AAAACTCCCT 
AAAACTCCCT 
GAGGCTCCCG 
C-AGGCTCCCG 
AAAACTCCCT 



100 

cCTGGCCCCA 



CCTGGC 
CCTGGC 
CCTGGC 
TCTGGC 
TCTGGC 
CCTGGC 



CCA 
CCCA 
CCCA 
CCCA 
CCCA 
oCCA 



151 

2c aGCAAATCcT 
2c8 TGCAAATCTT 

25 AGCAAATCCT 

55 AGCAAATCCT 
2 9c AGCAAATCCT 

5b AGCAAATCCT 
lia AGCAAATCCT 



TaACCAAT.T 
TCACCAATTT 
TAACCAATCT 
TAACCAATCT 
TAACCAATTT 
TAACCAATTT 
TAACCAATCT 



CTCAAAagTC 
CTCAAAAGTC 
CTCAAAGGTC 
CTCAAAGGTC 
CTCAAAAGTC 
CTCAAAAGTC 
CTCAAAAATC 



TATGGcCCTG 
TATGGTCCTG 
TATGGCCCTG 
TATGGCCCTG 
TATGGCCCTG 
TATGGCCCTG 
TATGGCCCTG 



200 

TGTTCACt . T 
TGTTCACCGT 
TGTTCACTCT 
TGTTCACTCT 
TGTTCACTGT 
TGTTCACTGT 
TGTTCACTCT 



201 

2c GTATTTTGGC 
.2 c 8 GTATTTTGGC 
25 GTATTTTGGC 
65 GTATTTTGGC 
2 9c GTATTTTGGC 
11a GTATTTTGGC 



cTGaAaCcCA 
ATGAATCCCA 
CTGAAACCCA 
CTGAAACCCA 
CTGAAGCCCA 
CTGGAACGCA 



TaGTGGTG.T 
TAGTGGTGTT 
TAGTGGTGCT 
TAGTGGTGCT 
TTGTGGTGTT 
TGGTGGTGCT 



gCATGGATAT 
TCATGGATAT 
GCATGGATAT 
GCATGGATAT 
GCATGGATAT 
GCATGGATAT 



250 

GAaGcaGTGA 
GAGGCAGTGA 
GAAGCAGTGA 
GAAGCAGTGA 
GAAGCAGTGA 
GAAGTGGTGA 



251 

2c AGGAaGCCCT 
2c8 AGGAAGCCCT 

25 AGGAAGCCCT 

65 AGGAAGCCCT 
29c AGGAGGCCCT 

6b AGGAGGCCCT 
11a AGGAAGCCCT 



GAT T GAT c . T 
GATTGATAAT 
GATTGATCTT 
GATTGATCTT 
GATTGATCAT 
GATTGATCAT 
GATTGATCTT 



GGAGAGGAGT 
GGAGAGGAGT 
GGAGAGGAGT 
GGAGAGGAGT 
GGAGAGGAGT 
GGAGAGGAGT 
GGAGAGGAGT 



TTTCTGGAAG 
7TTCTGGAAG 
TTTCTGGAAG 
TTTCTGGAAG 
TTTCTGGAAG 
TTTCTGGAAG 
TTTCTGGAAG 



300 

AGGca.TTtc 
AGGCAATTCC 
AGGCATTTTC 
AGGCATTTTC 
AGGAAGTTTT 
AGGAAGTTTT 
AGGCCATTTC 



301 

2c CCAcTggCTg 
2c8 CCAATATCTC 
25 CCACTGGCTG 

2 9c CCAGTGGCTG 
5b CCAGTGGCTG 
11a CCACTGGCTG 



AAAgAg . TAa 
AAAGAATTAC 
AAAGAGCTAA 
AAAGAGCTAA 
AAAAAGTTAA 
AAAAAGT TAA 
AAAGAGCTAA 



CA.AGGA.TT 
TAAAGGACTT 
CAGAGGATTT 
CAGAGGATTT 
CAAAGGACTT 
CAAAGGACTT 
CAGAGGATTT 



GGAATcgTTT 
3GAATCATTT 
GGAATTGTTT 
3GAATTGTTT 
3GAATCCTTT 
3GAATCCTTT 
GGAATCGTTT 



350 

tCAGCAATGG 
CCAGCAATGG 
TCAGCAATGG 
TCAGCAATGG 
TCAGCAATGG 
TCAGCAATGG 
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351 

2c AAAGAgATGG 
2c8 AAAGAGATGG 

25 AAAGAAATGG 

65 AAAGAAATGG 
2 9c AAAGAGATGG 

6b AAAGAGATGG 
11a AAAGAGATGG 

401 

2c TTGGGATGGG 
2c8 TTGGGATGGG 

25 TTGGGATGGG 

65 TTGGGATGGG 
2 9c TTGGGATGGG 

6b TTGGGATGGG 
ila TTGGGATGGG 

451 

2c TGCCTTGTGG 
2c8 TGCCTTGTGG 

25 TGCCTTGTGG 

65 TGCCTTGTGG 
2 9c TGCCTTGTGG 

6b TGCCTTGTGG 
11a TGCCTTGTGG 

501 

2c TTTCATCCTG 
2c8 TTTCATCCTG 

25 TTTCATCCTG 

65 TTTCATCCTG 
29c TTTCATCCTG 

6b TTTCATCCTG 
11a TTTCATCCTG 

551 

2c AtaAaCG.TT 

2c8 AGAAACGATT 

25 ATAAACGTTT 

65 ATAAACGTTT 

2 9c ATGATCGATT 

5b ATGATCGATT 

11a AGAAACGTTT 



AAGGAGATCC 
AAGGAGATCC 
AAGGAGATCC 
AAGGAGATCC 
AAGGAGATCC 
AAGGAGATCC 
AAGGAGATCC 



GAAGAGGAGC 
GAAGAGGAGC 
GAAGAGGAGC 
GAAGAGGAGC 
GAAGAGGAGC 
GAAGAGGAGC 
GAAGAGGAGC 



AGGAGTTGAG 
AGGAGTTGAG 
AGGAGTTGAG 
AGGAGTTGAG 
AGGAGTTGAG 
AGGAGTTGAG 
AGGAGTTGAG 



GGCTGTGCTC 
GGCTGTGCTC 
GGCTGTGCTC 
GGCTGTGCTC 
GGCTGTGCTC 
GGCTGTGCTC 
GGCTGTGCTC 



tGATTATAAA 
T GAT TAT AAA 
TGATTATAAA 
TGATTATAAA 
TGATTATAAA 
TGATTATAAA 
C GAT TAT AAA 



4/30 

GGCGTTTCTc 
GGCGTTTCTC 
GGCGTTTCTC 
GGCGTTTCTC 
GGCGTTTCTG 
GGCGTTTCTG 
GGCGTTTCTC 



ATtGAGGACC 
ATTGAGGACC 
ATTGAGGACC 
ATTGAGGACC 
ATCGAGGACC 
ATCGAGGACC 
ATTGAGGACC 



AAAAACCAAg 
AAAAACCAAG 
AAAAACCAAG 
AAAAACCAAG 
AAAAACCAAT 
AAAAACCAAT 
AAAAACCAAG 



CCTGCAATGT 
CCTGCAATGT 
CCTGCAATGT 
CCTGCAATGT 
CCTGCAATGT 
CCTGCAATGT 
CCTGCAATGT 



GATCAG . aaT 
GATCAGAATT 
GATCAGCAAT 
GATCAGCAAT 
GATCAGAGGT 
GATCAGAGGT 
GATCAGCAAT 



400 

cTGCGGAATT 
TTGCGGAATT 
CTGCGGAATT 
TCATGACG CTGCGGAATT 
CTGCGGAATT 



CCTCAtgAcg 
CCTCACAAAC 
CCTCATGACG 



CCTCATGACT 
CCTCATGACT 
CCTCATGACG 



GTGTTCAAGA 
GTGTTCAAGA 
GRGRRCAAGA 
GTGTTCAAGA 
GTGTTCAAGA 
GTGTTCAAGA 
GTGTTCAAGA 



GCcTCACCCT 
GCTTCACCCT 
GCCTCACCCT 
GCCTCACCCT 
GCCTCACCCT 
GCCTCACCCT 
GCTTCACCCT 



GATCTGCTCc 
GATCTGCTCC 
GATCTGCTCC 
GATCTGCTCC 
GATCTGCTCT 
GATCTGCTCT 
GATCTGCTCC 



TTCTt AaCt" 
TTCTCACCC: 

mrriQiTiiT'^^^rnr. 

TTCTTAACT- 

fTT ^* H"i 71 7^(^ *"P H 



FIG. 2-3. 

SUBSTITUTE SHEET (RULE 26) 



CTGCGGAATT 
CTGCGGAATT 

450 

GGAAGCcCgC 
GGAAGCTCAC 
GGAAGCCCGC 
GGAAGCCCGC 
GGAAGCCCGC 
GGAAGCCCGC 
GGAAGCCCGC 

500 

GTGATCCCAC 
GTGATCCCAC 
GTGATCCCAC 
GTGATCCCAC 
GTGATCCCAC 
GTGATCCCAC 
GTGATCCCAC 

550 

. TTaTTTTCC 
GTTGTTTTCC 
ATTATTTTCC 
ATTATTTTCC 
GTTATTTTCC 
GTTATTTTCC 
ATTATTTTCC 

600 

gATGgAAAaa 
GATGAAAAGA 
AATGGAAAAG 
AATGGAAAAG 
GATGGAAAAA 
GATGGAAAAA 
GATGGAAAAA 



WO 95/30766 



PCI7US95/05744 



601 

2c TT.AATGAAA 
2c8 TTCAATGAAA 

25 TTGAATGAAA 

65 TTGAATGAAA 
2 9c TTCAATGAAA 

6b TTCAATGAAA 
11a TTGAATGAAA 



ACaTCAgGAT 
ACTTCAGGAT 
ACATCAAGAT 
ACATCAAGAT 
ACCTCAGGAT 
ACCTCAGGAT 
ACATCAGGAT 



5/30 

TcTgAgC . cc 
TCTGAACTCC 
TTTGAGCAGC 
TTTGAGCAGC 
TCTGAGCTCT 
TCTGAGCTCT 
TGTAAGCACC 



CC. TGGATCC 
CCATGGATCC 
CCCTGGATCC 
CCCTGGATCC 
CCATGGATCC 
CCATGGATCC 
CCCTGGATCC 



650 

AG . TcTGCAA 
AGGTCTGCAA 
AGATCTGCAA 
AGATCTGCAA 
AGGTCTGCAA 
AGGTCTGCAA 
AGATATGCAA 



651 . 

2c TAATTT.cCt 
2c8 TAATTTCCCT 

25 TAATTTTTCT 

65 TAATTTTTCT 
2 9c TAATTTCCCT 

6b TAATTTCCCT 
I la TAATTTTCCC 



cct . TCATtG 
CTACTCATTG 
CCTATCATTG 
CCTATCATTG 
GCTCTCATCG 
GCTCTCATCG 



ATTattTCCC 
ATTGTTTCCC 
ATTACTTCCC 
ATTACTTCCC 
ATTATCTCCC 
ATTATCT ZCC 
ATTATTTCCC 



.GGAActCA. 
AGGAACTCAC 
GGGAACTCAC 
GGGAACTCAC 
AGGAAGTCAT 
AGGAAGTCAT 
GGGAACCCAT 



700 

AAcAAAtTac 
AACAAAGTGC 
AACAAATTAC 
AACAAATTAC 
AATAAAATAG 
AATAAAATAG 
AACAAATTAC 



701 

2c tTaAAAA.gT 
2c8 TTAAAAATGT 

25 TTAAAAACGT 

65 TTAAAAACGT 
2 9c CTGAAAATTT 

6b CTGAAAATTT 
11a TTAAAAACCT 



TGCTtttAtg 
TGCTCTTACA 
TGCTTTTATG 
TGCTTTTATG 
TGCTTACATT 
TGCTTACATT 
TGCTTTTATG 



aaAAGTtAta 
CGAAGTTACA 
AAAAGTTATA 
AAAAGTTATA 
AAAAGTTATG 
AAAAGTTATG 
GAAAGTGATA 



TtttGGAgAa 
TTAGGGAGAA 
TTTTGGAAAA 
TTTTGGAAAA 
TATTGGAGAG 
TATTGGAGAG 
TTTTGGAGAA 



750 

AgTAAAAGAA 
AGTAAAAGAA 
AGTAAAAGAA 
AGTAAAAGAA 
AATAAAAGAA 
AATAAAAGAA 
AGTAAAAGAA 





751 










800 


2c 


C Ac CAAGaAT 


Ca.TGGAcaT 


gAACAa. CCT 


CgGGACTT 


irn 7^ 


TtGATTGcTT 


2c8 


CACCAAGCAT 


CACTGGATGT 


TAACAATCCT 


CGGGACTT 


TA 


TGGATTGCTT 


25 


C AC CAAGAAT 


CAATGGACAT 


GAACAACCCT 


CAGGACTT 


'TA 


TTGATTGCTT 


65 


C AC CAAGAAT 


CAATGGACAT 


GAACAACCCT 


CAGGACTT 


'TA 


TTGATTGCTT 


29c 


CAT CAAGAAT 


CCCTGGACAT 


GAACAGTGCT 


CGGGACTT 


'TA 


TTGATTGTTT 


6b 


CAT CAAGAAT 


CCCTGGACAT 


GAACAGTGCT 


CGGGACTT 


'TA 


TTGATTGTTT 


11a 


C AC CAAGAAT 


CGATGGACAT 


CAACAACCCT 


CGGGACTT 


'TA 


TTGATTGCTT 



801 

2c CCTGATcAAA 
2c8 CCTGATCAAA 

25 CCTGATGAAA 

65 CCTGATGAAA 
2 9c CCTGATCAAA 

6b CCTGATCAAA 
1 la CCTGATCAAA 



ATGGAg.AGG 
ATGGAGCAGG 
ATGGAGAAGG 
ATGGAGAAGG 
ATGGAACAGG 
ATGGAACAGG 



AAAAGcAcAA cCAAcagTCt 
AAAAGGACAA CCAAAAGTCA 
AAAAGCACAA CCAACCATCT 
AAAAGCACAA CCAACCATCT 
AAAAGCACAA TCAACAGTCT 
AAAAGCACAA TCAACAGTCT 
A AAAG C A AAA CCAACAGTCT 



FIG. 2-4. 

SUBSTITUTE SHEET (RULE 26) 



850 

GAATTtAcTa 
GAATTCAATA 
GAATTTACTA 
GAATTTACTA 
GAATTTACTG 
3AATTTACTG 



WO 95/30766 



PCT/US95/05744 



851 

2c TTGAAAgCTT 
2c8 TTGAAAACTT 

25 TTGAAAGCTT 

65 TTGAAAGCTT 
29c TTGAAAGCTT 

6b TTGAAAGCTT 
11a TTGAAAACTT 



Ggta . . CACT 
GGTTGGCACT 
GGAAAACACT 
GGAAAACACT 
GATAGCCACT 
GATAGCCACT 
GGTAATCACT 



6/30 

G.AgcTGA. t 
GTAGCTGATC 
3CAGTTGACT 
GCAGTTGACT 
GTAACTGATA 
GTAACTGATA 
GCAGCTGACT 



TgtTTGgaGC 
TATTTGTTGC 
TGTTTGGAGC 
TGTTTGGAGC 

lb 1 1 i bbuov, 

TGTTTGGGGC 
TACTTGGAGC 



900 

TGG.ACAGAG 
TGGAACAGAG 
TGGGACAGAG 
TGGGACAGAG 
TGGAACAGAG 
TGGAACAGAG 
TGGGACAGAG 



901 

2c ACaACaAGCA 
2c8 ACAACAAGCA 

25 ACGACAAGCA 

65 ACGACAAGCA 
2 9c ACAACGAGCA 

6b ACAACGAGCA 
lia ACAACAAGCA 



C.AC.CTGAG 
CCACTCTGAG 
CAACCCTGAG 
CAACCCTGAG 
CCACTCTGAG 
CCACTCTGAG 
CAACCCTGAG 



ATATG . . CTC 
ATATGGACTC 
ATATGCTCTC 
ATATGCTCTC 
ATATGGACTC 
ATATGGACTC 
ATATGCTCTC 



CT.CTCCTGC 
CTGCTCCTGC 
CTTCTCCTGC 
CTTCTCCTGC 
CTGCTCCTGC 
CTGCTCCTGC 
CTTCTCCTGC 



950 

TGAAGcACCC 
TGAAGCACCC 
TGAAGCACCC 
TGAAGCACCC 
TGAAGTACCC 
TGAAGTACCC 
TGAAGCACCC 



951 

2c AGAGGTCACA 
2c8 AGAGGTCACA 

25 AGAGGTCACA 

65 AGAGGTCACA 
29c AGAGGTCACA 

6b AGAGGTCACA 
11a AGAGGTCACA 



GCTAAAGTCC 
GCTAAAGTCC 
GCTAAAGTCC 
GCTAAAGTCC 
GCTAAAGTCC 
GCTAAAGTCC 
GCTAAAGTCC 



AGGAAGAGAT 
AGGAAGAGAT 
AGGAAGAGAT 
AGGAAGAGAT 
AGGAAGAGAT 
AGGAAGAGAT 
AGGAAGAGAT 



TGAacgTGTa 
TGATCATGTA 
TGAACGTGTG 
TGAACGTGTG 
TGAATGTGTA 
TGAATGTGTA 
TGAACGTGTC 



1000 
aTTGGCAGAa 
ATTGGCAGAC 
ATTGGCAGAA 
ATTGGCAGAA 
GTTGGCAGAA 
GTTGGCAGAA 
ATTGGCAGAA 



1001 

2c ACCGGAGCCC 
2c8 ACAGGAGCCC 

25 ACCGGAGCCC 

65 ACCGGAGCCC 
2 9c ACCGGAGCCC 

6b ACCGGAGCCC 
ila ACCGGAGCCC 



CTGcATGCAg 
CTGCATGCAG 
CTGCATGCAA 
CTGCATGCAA 
CTGTATGCAG 
CTGTATGCAG 
CTGCATGCAG 



GAcAGGaGcC 
GATAGGAGCC 
GACAGGAGCC 
GACAGGAGCC 
GACAGGAGTC 
GACAGGAGTC 
GACAGGGGCC 



ACATGCCcTA 
ACATGCCTTA 
ACATGCCCTA 
ACATGCCCTA 
ACATGCCCTA 
ACATGCCCTA 
ACATGCCCTA 



1050 
CACaGATGCT 
CACTGATGCT 
CACAGATGCT 
CACAGATGCT 
CACAGATGCT 
CACAGATGCT 
CACAGATGCT 



1051 

2c GTgGTGCACG 
2c8 GTAGTGCACG 

25 GTGGTGCACG 

55 GTGGTGCACG 
2 9c GTGGTGCACG 

6b GTGGTGCACG 



AG . TCCAGAG 
AGATCCAGAG 
AGGTCCAGAG 
AGGTCCAGAG 
AGATCCAGAG 
AGATCCAGAG 
AGGTCCAGAG 



ATACattGAC 
ATACAGTGAC 
ATACCTTGAC 
ATACATTGAC 
ATACATTGAC 
ATACATTGAC 
AT AC AT CG AC 



CT.cTCCCCA 
CTTGTCCCCA 
CTTCTCCCCA 
CTTCTCCCCA 

r*m r*-n -n n o rt ^. 
w * w**^ X L w w J"* 



FIG. 2-5. 

SUBSTITUTE SHEET (RULE 26) 



1100 
CCagccTGCC 
CCGGTGTGCC 
CCAGCCTGCC 
CCAGCCTGCC 
CCAACCTGCC 
CCAACCTGCC 

f~> r*m r* r+ >-» 



WO 95/30766 



PCT/US95/05744 



1101 7/30 1150 

2 c CCATGCAGTG ACCtgTGA. . tTAAaTTCAg AAACTAcCTC AT . CCCAAGG 

2c8 CCATGCAGTG ACCACTGATA CTAAGTTCAG AAACTACCTC ATCCCCAAGG 
25 CCATGCAGTG ACCTGTGACA TTAAATTCAG AAACTATCTC ATTCCCAAGG 
65 CCATGCAGTG ACCTGTGACA TTAAATTCAG AAACTATCTC ATTCCCAAGG 

29c CCATGCAGTG ACCT3TGATG TTAAATTCAA AAACTACCTC ATCCCCAAGG 
6b CCATGCAGTG ACCTGTGATG TTAAATTCAA AAACTACCTC ATCCCCAAGG 

11a CCATGCAGTG ACCTGTGACG TTAAATTCAG AAACTACCTC ATTCCCAAGG 

1151 * 1200 

2c GCAcaACCAT A.Taac.Tcc CTgACTTCtG TGCTaCAtgA . .ACAAAGAA 

2c8 GCACAACCAT AATC-GCATTA CTGACTTCCG TGCTACATGA TGACAAAGAA 

25 GCACAACCAT ATTAATTTCC CTGACTTCTG TGCTACATGA CAACAAAGAA 

55 GCACAACCAT ATTAATTTCC CTGACTTCTG TGCTACATGA CAACAAAGAA 

2 9c GCAQGACCAT AATAACATCC CTGACTTCTG TGCTGCACAA TGACAAAGAA 

5b GCATGACCAT AATAACATCC CTGACTTCTG TGCTGCACAA TGACAAAGAA 

11a GCACAACCAT ATTAACTTCC CTGACTTCTG TGCTACATGA CAACAAAGAA 





1201 




2c 


TTtCCcAAcC 


CAgAgATgTT 


2c8 


TTTCCTAATC 


CAAATATCTT 


25 


TTTCCCAACC 


CAGAGATGTT 


65 


TTTCCCAACC 


CAGAGATGTT 


29c 


TTCCCCAACC 


CAGAGATGTT 


6b 


TTCCCCAACC 


CAGAGATGTT 


11a 


TTTCCCAACC 


CAGAGATGTT 





1251 






2c 


cAA.T 


TTAAG 


AAAAGT.AcT 


2c8 


CAACT 


TTAAG 


AAAAGT G AC T 


25 


CAATT 


TTAAG 


AAAAGTAAAT 


65 


CAATT 


TTAAG 


AAAAGTAAAT 


29c 


CAACT 


TTAAG 


AAAAGTGACT 


6b 


CAACT 


TTAAG 


AAAAGT G ACT 


11a 


AAATT 


TTAAG 


AAAAGT AACT 



1250 

TGACCCT . g . CACTTTCTgG AT . A . . gTGG 
TGACCCTGGC CACTTTCTAG ATAAGAATGG 
TGACCCTCAT CACTTTCTGG ATGAAGGTGG 
TGACCCTCAT CACTTTCTGG ATGAAGGTGG 
TGACCCTGGC CACTTTCTGG ATAAGAGTGG 
TGACCCTGGC CACTTTCTGG ATAAGAGTGG 
TGACCCTCGT CACTTTCTGG ATGAAGGTGG 

1300 

ACTTCATGCC TTTCTCAGCA GGAAAACGcA 
ACTTCATGCC TTTCTCAGCA GGAAAACGAA 
ACTTCATGCC TTTCTCAGCA GGAAAACGGA 
ACTTCATGCC TTTCTCAGCA GGAAAACGGA 
ACTTCATGCC TTTCTCAGCA GGAAAACGGA 
ACTTCATGCC TTTCTCAGCA GGAAAACGGA 
ACTTCATGCC T TTCTCAGCA GGAAAACGGA 





1301 










1350 


-i c 


TtTGTgrgGG 


AGA>GgcCT 


g 


GCCcGCATGG 


AGCTgTTTTT 


ATTcCTgACC 


2c8 


TTTGTGCAGG 


AGAAGGACT 


m 


GCCCGCATGG 


AGCTATTTTT 


ATTTCTAACC 


25 


TTTGTGTGGG 


AGAAGC C'^T 






AGCTGTTTTT 


ATTCCTGACC 


55 


TTTGTGTGGG 


AGAAGCCCT 


G 


GCCGGCATGG 


AGCTGTTTTT 


ATTCCTGACC 


29c 


TGTGTATGGG 


AGAGGGCCT 


/-» 
O 


GCCCGCATGG 


AGCTGTTTTT 


ATTCCTGACC 


5b 


TGTGTATGGG 


AG AG GGCCT 


/-» 

o 




AGCTGT fr T rr " r 


rprnp/-m/-< ^ 




TT* 






r> /~* /*> /-i rr> /~» /*+ 

■ r w w w -K^n ± '^O 


^ ^^^^ 

.--\jv^ :o- - 1 .i 





FIG. 2-6. 



SUBSTITUTE SHEET (RULE 26) 



WO 95/30766 



PCT/US95/0S744 



1351 8/30 1400 

2c ♦ ccATTTTaC AGAACTTTAA CCTGAAATCT ctggtTGAcc cAAAG.AccT 

2c8 ACAATTTTAC AGAACTTTAA CCTGAAATCT GTTGATGATT TAAAGAACCT 

25 TCCATTTTAC AGAACTTTAA CCTGAAATCT CTGGTTGACC CAAAGAACCT 

65 TCCATTTTAC AGAACTTTAA CCTGAAATCT CTGGTTGACC CAAAGAACCT 

2 9c ACCATTTTGC AGAACTTTAA CCTGAAATCT CAGGTTGACC CAAAGGATAT 

6b ACCATTTTGC AGAACTTTAA CCTGAAATCT CAGGTTGACC CAAAGGATAT 

11a TTCATTTTAC AGAACTTTAA CCTGAAATCT CTGATTGACC CAAAGGACCT 

1401 1450 

2c tgAcAccACt cCagTTg.CA AtGgatTTGc ttcTgTgCC. CCCTtcTAcC 

2c8 CAATACTACT GCAGTTACCA AAGGGATTGT TTCTCTGCCA CCCTCATACC 

25 TGACACCACT CCAGTTGTCA ATGG2TTTGC CTCTGTGCCG CCCTTCTACC 

65 TGACACCACT CCAGTTGTCA ATGGATTTGC CTCTGTGCCG CCCTTCTACC 

2 9c TGACATCACC CCCATTGCCA ATGCATTTGG TDGTGTGCCA CCCTTGTACC 

6b TGACATCACC CCCATTGCCA ATGCATTTGG TCGTGTGCCA CCCTTGTACC 

11a TGACACAACT CCTGTTGTCA ATGGATTTGC TTCTGTCCCG CCCTTCTATC 

1451 *** 1500 

2c AGcT.TGCTT CATtCCTGTC TGAAGAAggg cAGatggtcT GGCTGCT . cT 

2c8 AGATCTGCTT CATCCCTGTC TGAAGAATGC TAGCCCATCT GGCTGCTGAT 

25 AGCTGTGCTT CATTCCTGTC TGAAGAAGAG CAGATGGCCT GGCTGCTGCT 

65 AGCTGTGCTT CATTCCTGTC TGAAGAAGAG CAGATGGCCT GGCTGCTGCT 

2 9c AGCTGTGCTT CATTCCTGTC TGAAGAAGGG CAGATAGTTT GGCTGCTCCT 

6b AGCTCTGCTT CATTCCTGTC TGAAGAAGGG CAGATAGTTT GGCTGCTCCT 

11a AGCTGTGCTT CATTCCTGTC TGAAGAAGCA CAGATGGTCT GGCTGCTCCT 

1501 1550 

2c gTGCtgTC.C t . . . ttt . . tctgg ggcaattt cC . -ctt.cat. 

2c8 CTGCTATCAC CTGCAACTCT TTTTTTATCA AGGACATTCC CACTATTATG 

25 GTGCAGTCCC TGCAGCTCTC TTTCCTCTGG GGCATTATCC ATCTTTCACT 

65 GTGCAGTCCC TGCAGCTCTC TTTCCTCTGG GGCATTATCC ATCTTTCACT 

2 9c GTGCTGTCAC CTGCAATTCT CCCTTATCAG GGCCATTAGC CTCTCCCTTC 

6b GTGCTGTCAC CTGCAATTCT CCCTTATCAG GGCCATTGGC CTCTCCCTTC 

11a GTGCTGTCCC TGCAGCTCTC TTTCCTCTGG TCCAAATTTC ACTA.TCTGTG 

1551 1600 

2c . .t.tt..tg c • .ttt . Tea tcTg . catct caca.t.c. cttcccrta. 

2c8 TCTTCTCTGA CCTCTCATCA AATCTTCCCA TTCACTCAAT ATCCCATAAG 

25 ATCTGTAATG CCTTTTCTCA CCTGTCATCT CACATTTTCC CTTCCCTGAA 

65 ATCTGTAATG CCTTTTCTCA CCTGTCATCT CACATTTTCC CTTCCCTGAA 

29a TCTCTGTGAG GGATATTTTC TCTGACTTGT CAATCCACAT CTTCCCATTC 

5b TCTCTATGAG GGATATTTTC TCTGACTTGT CAATCCACAT CTTCCCATTC 

-\ „ -« mnpmrp m m o rp ^ r** r+ m rn^moii/^^rprpm m^p^fPfr^p^/N r+*L\f^, rryfn^^r* 

FIG. 2-7. 



SUBSTITUTE SHEET (RULE 26) 



WO 95/30766 



PCT/US95/05744 





1601 




9/30 




1650 


2c 


catc . Ag . . a 


ccaTt.a. . . 


. caat .tcca 


agag . gt g . . 


ttt . Tt . .ct 


2c8 


CATCCAAAC? 


CCATTAAGGA 


GAGTTGTTCA 


GGTCACTGCA 


CAAATATATC 


25 


GATCTAGTGA 


ACATTCGACC 


TTCATTACGG 


AGAGTTTCCT 


ATGTTTCACT 


65 


GATCTAGTGA 


ACATTCGACC 


TCCATTACTT 


AGAGTTTCCT 


ATGTTTCACT 


29c 


CCTCAAGATC 


CAATGAACAT 


CCAACCTCCA 


TTAAAGAGAG 


TTTCTTGGGT 


6b 


CCTCAAGATC 


CAATGAACAT 


CCAACCTCCA 


TTAAAGAGAG 


TTTCTTGGGT 


11a 


TGAACATTCA 


GCCTCCATTA 


AAAAAGTTTC 


ACTGTGCAAA 


TATATCTGCT 




1651 








1700 


2c 


• w CCS.CCC c. • 


atctatc . . t 


. . . . Ct . ct . 


t.t .t. -aT. 


actttgattg 


2c8 


TGCAATTATT 


CATACTCTGT 


AACACTTGTA 


TTAATTGCTG 


CATATGCTAA 


25 


GTGCAAATAT 


ATCTGCTATT 


CTCCATACTC 


TGTAACAGTT 


GCATTGACTG 


65 


/"> m /-i -»v ^ 7\ »-n 7\ m 


ATCTGCTATT 


CTCCATACTC 


TGTAACAGTT 


GCATTGACTG 


29c 




ATATATCTGC 


TATTCTCCAT 


ACTCTGTATC 


ACTTGTATTG 


6b 


f< *>\ m m r* n 


r\± L/ x Ow 


«-n -«\ rp rp ^» rrs /— Ty m 

-nJ. 1 w i Lnl 


ACTCTGTATC 


ACTTGTATTG 


11a 


ri -L — w w w w *~ 


CTCTATAATA 




GTGCCACATA 


ATGCTGATAC 



1701 1750 
2c tec. eta. tg aTg.taatt. tttaatattg ..ttattg.. A...t.ttAt 

2c8 TACTTTTCTA ATGCTGACTT TTTAATATGT TATCACTGTA AAACACAGAA 
25 TCACATAATG CTCATACTTA TCTAATGTTG AGTTATTAAT ATGTTATTAT 
65 TCACATAATG CTCATACTTA TCTAATGTTG AGTTATTAAT ATGTTATTAT 

29c ACCACCACAT ATGCTAATAC CTATCTACTG CTGAGTTGTC AGTATGTTAT 
6b ACCACCACAT ATGCTAATAC CTATCTACTG CTGAGTTGTC AGTATGTTAT 

11a TTGTCTAATG TTGAGTTATT AACATATTAT TATTAAATAG A 



1751 180C 
2c .A.t.a.aaA .aaAtgAtaa rt.t.t..aa aT...aagtc A.tgc.tt. 

2c8 AAGTGATTAA TGAATGATAA TTTAGTCCAT TTCTTTTGTG AATGTGCTAA. 
25 TAAATAGAGA AATATGATTT GTGTATTATA ATTCAAAGGC ATTTCTTTTC 
65 TAAATAGAGA AATATGATTT GTGTATTATA ATTCAAAGGC ATTTCTTTTC 

2 9c CACTAGAAAA CAAAGAAAAA TGATT AATAA A TGACAATTC AGAGCCAAAA 
6b C AC TAT AAAA CAAAGAAAAA TGATT AATAA AT GACAATTC AGAGCCATTT 

1801 1850 
2c a.. at. i.e. .aaTaaAaag cattaTtATT tgctgaaAaa aaGTCAGTTC 

2c8 ATAAA AAGTG TTATTAATTG CTGGTTCA 
25 TGCATGTTCT A AATAA AAAG CATTATTATT TGCTGAAAAA AA 
65 TGCATGTTCT A AATAAA AAG CATTATTATT TGCTGAAAAA AA 

2 9c AAAAAAAAAA 

6b ATTCTCTGCA TGCTCTAGAT AAAAAT G AT T. ATTATTTACT GGGTCAGTTC 

FIG. 2-8. 



SUBSTITUTE SHEET (RULE 26) 



WO 95730766 



PCT/US95/05744 



1851 10/30 1900 

6b TTAGATTTCT TTCTTTTGAG TAAAATGAAA GTAAGAAATG AAAGAAAATA 

1901 1950 

6b GAATGTGAAG AGGCTGTGCT 3GCCCTCATA GTGTTAAGCA CAAAAAGGGA 

1951 2000 

6b GAAAGGTAAG AGGGTAGGAA AGCTGTTTTA GCTAAATGCC ACCTAGAGTT 

2001 2050 

6b ATTGGAGGTC TGAATTTGGA AAAAAAAACT ATGTCCAGGA GAACATTAAG 

2101 2150 

6b TGTTTGAATT CATGCTCTGC TTTTGTGTTA CTGTAAACAC AAGATCAAGA 

2151 2200 

6b TTTGGATAAT CTTTTTCCTT TGTGTTTCCA ACTTAGATCA TGTCT AAATA 

2201 2216 

6b TA.TGCTTTCA TATGGC 

FIG. 2-9. 



SUBSTITUTE SHEET (RULE 26) 



WO 95/30766 



PCIYUS95/05744 



11/30 
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